同步技术深度解析：Synchrobench 测试并发算法的影响

197 浏览量更新于2024-08-25 收藏 1.48MB PDF 举报

"这篇论文详细探讨了同步技术的各个方面，通过Synchrobench基准测试，评估了5种不同的同步技术在31个并发算法上的影响。作者Vincent Gramoli是NICTA和悉尼大学的研究员。研究在Intel、Sun Microsystems和AMD的多核平台上进行，开发了一个新的微基准测试套件——Synchrobench，旨在帮助社区评估新的数据结构和同步技术。主要结论表明，尽管比较并交换（compare-and-swap）在多核性能上表现优秀，但正确实现它很具挑战性；乐观锁的性能结果差异较大，而事务内存提供了更一致的性能。" 本文是关于并发编程中的同步技术的深入研究，主要关注的是同步对并发算法性能的影响。同步是多线程和多进程环境中确保数据一致性的重要手段，但不同的同步机制在不同场景下的表现可能有显著差异。首先，文章提到了一种广泛的同步技术比较，涵盖了5种不同的同步方法。这些方法可能包括传统的锁机制（如互斥量、读写锁）、无锁编程、乐观锁、事务内存（Transactional Memory）以及基于硬件指令的原子操作（如CAS，Compare-and-Swap）。作者通过31个来自最新文献的数据结构算法来测试这些技术，确保了评估的全面性和深度。其次，Synchrobench是一个创新的工具，它是用C/C++和Java编写的，用于开发基准测试，以便在多个多核平台上测试各种数据结构和同步策略。这个工具对于研究人员和开发者来说，是一个宝贵的资源，因为它可以提供客观的性能数据，帮助他们选择最适合特定应用场景的同步技术。研究的第三个主要发现是，虽然CAS操作在多核环境下通常能够提供最佳性能，但其正确实现往往复杂且容易出错。这提醒开发者，仅仅选择高效的同步机制还不够，必须确保其在实际应用中的正确性和健壮性。此外，乐观锁的性能表现因情况而异，这意味着它们可能在某些情况下表现良好，但在其他情况下则可能导致性能下降。相比之下，事务内存提供了一种更一致的性能体验，这可能是因为它内置了回滚和冲突检测机制，从而降低了竞态条件的影响。这篇论文提供了丰富的信息，对于理解和优化并发算法的同步策略具有重要的参考价值。无论是研究人员还是实践经验丰富的开发者，都可以从中获得关于如何在实际项目中选择和实施同步机制的宝贵见解。

More Than You Ever Wanted to Know about Synchronization

Synchrobench, Measuring the Impact of the Synchronization on Concurrent Algorithms

Vincent Gramoli

NICTA and University of Sydney, Australia

vincent.gramoli@sydney.edu.au

Abstract

In this paper, we present the most extensive comparison of syn-

chronization techniques. We evaluate 5 different synchronization

techniques through a series of 31 data structure algorithms from the

recent literature on 3 multicore platforms from Intel, Sun Microsys-

tems and AMD. To this end, we developed in C/C++ and Java a

new micro-benchmark suite, called Synchrobench, hence helping

the community evaluate new data structures and synchronization

techniques. The main conclusion of this evaluation is threefold: (i) al-

though compare-and-swap helps achieving the best performance on

multicores, doing so correctly is hard; (ii) optimistic locking offers

varying performance results while transactional memory offers more

consistent results; and (iii) copy-on-write and read-copy-update suf-

fer more from contention than any other technique but could be

combined with others to derive efﬁcient algorithms.

Categories and Subject Descriptors

D.1. Programming Tech-

niques [Concurrent Programming]: Parallel programming

Keywords Benchmark; data structure; reusability; lock-freedom

1. Introduction

The increasing core count raises new challenges in the development

of efﬁcient algorithms that allow concurrent threads to access

shared resources. Not only have developers to choose among a

large set of thread synchronization techniques, including locks, read-

modify-write, copy-on-write, transactions and read-copy-update, but

they must select dedicated data structure algorithms that leverage

each synchronization under a certain workload. These possibilities

have led to an increase in the number of proposed concurrent

data structures, each being shown efﬁcient in “some” settings.

Unfortunately, it is almost impossible to predict their performance

given the hardware and OS artifacts. A unique framework is thus

necessary to evaluate their performance on a common ground before

recommending developers to choose a speciﬁc synchronization

technique.

On the one hand, synchronization techniques are usually tested

with standard macro-benchmarks [

] whose workloads alternate

realistically between various complex patterns. These macro-

benchmarks are however of little help when it comes to nailing

down the bottleneck responsible of performance drops. On the other

hand, proﬁling tools that measure cache trafﬁc [

] and monitor

memory reclamation can be extremely useful in tuning the im-

plementation of an algorithm to a dedicated hardware platform,

however, they are of little help in optimizing the algorithm itself.

This is the reason why micro-benchmarks have been so popular

to evaluate new algorithms. They are invaluable tools that comple-

ment macro evaluations and proﬁling tool boxes in order to evaluate

novel concurrent algorithms. In particular, they are instrumental

in conﬁrming how an algorithm can improve the performance of

data structures even though the same algorithm negligibly boosts a

particular application on a speciﬁc hardware or OS. Unfortunately,

these micro-benchmarks are often developed speciﬁcally to illus-

trate the performance of one algorithm and are usually tuned for

this purpose. More importantly, they are poorly documented as it

is unclear whether updates comprise operations that return unsuc-

cessfully without modifying, or whether the reported performance

of a concurrent data structure are higher than the performance of its

non-synchronized counterpart running sequentially.

Our contribution is the most extensive comparison of synchro-

nization techniques. We focus on the performance of copy-on-write,

mutual exclusion (e.g., spinlocks), read-copy-update, read-modify-

write (e.g., compare-and-swap) and transactional memory to syn-

chronize concurrent data structures written in Java and C/C++, and

evaluated on AMD Opteron, Intel Xeon and UltraSPARC T2 mul-

ticore platforms. We also propose Synchrobench, an open source

micro-benchmark suite written in Java and C/C++ for multi-core

machines to help researchers evaluate new algorithms and synchro-

nization techniques. Synchrobench is not intended to measure over-

all system performance or mimic a given application but is aimed

at helping programmers understand the cause of performance prob-

lems of their structures. Its Java version executes on top of the JVM

making it possible to test algorithms written in languages producing

JVM-compatible bytecode, like Scala. Its C/C++ version allows for

more control on the memory management.

Our evaluation includes 31 algorithms taken from the literature

and summarized in Table 1. It provides a range of data structures

from simple ones (e.g., linked lists) and fast ones (e.g., queues and

hash tables) to sorted ones (e.g., trees, skip lists). These structures

implement classic abstractions (e.g., collection, dictionary and set)

but Synchrobench also features special operations to measure the

reusability of the data structure in a concurrent library.

This systematic evaluation of synchronization techniques leads

to interesting conclusions, including three main ones:

1. Compare-and-swap is a double-edge sword.

Data structures

are typically faster when synchronized exclusively with compare-

and-swap than any other technique, regardless of the multicore

machines we tested. However, the lock-free use of compare-and-

swap makes the design of these data structures, and especially

the ones with non-trivial mutations, extremely difﬁcult. In

particular, we observed that there are only few existing full-

ﬂedged binary search trees using single-word compare-and-swap

and we identiﬁed a bug in one of them.

2. Transactions offer more consistent performance than locks.

We observed that optimistic locking techniques that consist of

traversing the structure and locking before revalidating help

reducing the number of locks used but also present great vari-

ations of performance depending on the considered structure

and the amount of contention. Transactional memory provides

1 2015/4/4

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38583278

粉丝: 5
资源: 886

同步技术深度解析：Synchrobench 测试并发算法的影响

Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask (p33-david)-计算机科学

Concurrent Programming on Windows - Synchronization and Time

Temporally Bounding TSO for Fence-Free Asymmetric Synchronization - 2015 (asplos2015-tbtso)-计算机科学

matlab高功率微波代码-Device-synchronization-for-wavelength-to-time-mapping-and

The impact of time synchronization error on passive coherent pulsed radar system

synchronization-with-multiple-process

Mathematical Analysis of the Impact of Timing Synchronization

The 12 Commandments of Synchronization - October 4, 2011 (10.1.1.227.3871)-计算机科学

multithreaded-synchronization-example

Calendar Synchronization - SyncPenguin-crx插件

最新资源