C11和C++11并发模型：解锁多核处理器的挑战与编程策略

需积分: 9 86 浏览量更新于2024-07-18 收藏 2.93MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

随着并行计算的普及，计算机系统的设计发生了根本性变革。传统通过提高单核处理器频率来追求更快性能的方式遭遇瓶颈，处理器厂商转向多核心设计以突破物理限制并保持性能增长。C11和C++11标准引入了并发模型，以适应这种变化，尤其是在多核环境下处理并发任务。在C11和C++11中，"relaxed memory concurrency"（非确定性内存模型）成为主流，这允许硬件和编程语言提供更高效的并行处理，但同时也带来了新的挑战。传统的因果律和全局内存视图假设不再适用，因为并行执行可能导致不确定性和数据竞争。程序员在编写并发代码时必须放弃直觉依赖，因为规范的可靠性降低，测试结果会因不同代际硬件对异常行为的宽容度而变化。这些新标准引入了线程同步、原子操作、条件变量、互斥锁和屏障等并发控制机制，以帮助开发者组织和协调线程间的通信。C11提供了`std::atomic`模板类和`std::atomic_flag`，允许对共享数据进行无须加锁的原子操作，从而减少竞态条件。`std::thread`和`std::mutex`用于创建和管理线程，确保并发安全。 C++11进一步强化了并发支持，引入了`std::future`和`std::promise`，便于异步编程和任务的执行顺序控制。此外，`std::condition_variable`和`std::lock_guard`等库函数有助于管理复杂的同步场景。然而，这些并发工具的使用也要求开发者理解和处理可能出现的死锁、活锁和资源饥饿等问题。 C11和C++11并发模型强调了编写并发代码的正确性至关重要，它鼓励使用现代编译器提供的编译器扩展和运行时检查，如`std::atomic_thread_fence`和`std::memory_order`，以确保程序在多核环境中的正确行为。同时，开发者需要借助现代工具和技术，如性能分析器和并发调试器，来检测和优化并发程序的性能。总结来说，C11和C++11的并发模型旨在提供一个灵活且高效的方式来管理多核系统中的并发任务，但同时也提出了对程序员的新要求，即理解和处理并发编程中的复杂性，以避免性能损失和潜在的错误。随着硬件的进步，理解和掌握这些并发特性和最佳实践将变得越来越关键。

资源详情

资源推荐

still, relaxed-concurrency bugs can be sen si t ive to the execution environment and manifest

with low probability, so this scheme mandates enormous testing resources. A well-deﬁned

memory model provides an abstraction of all of the various platforms that a program

might execute above, and constrains their behaviour. It is then the responsibility of com-

piler writers and processor vendors to ensure that each platform meets the guarantees

provided by the memory model. Deﬁning a memory model enables the creation of tools

that supp ort portable concurrent programming, and avoids the need for interminable

testing resources.

Attiya et al. showed that any concurrent programming language must include a min-

imal set of features in order to enable the programmer to construct consensus during the

execution of their programs [

19]; multiple threads must be able to agree on the state of

agivenpieceofdata. Inarelaxedmemorymodel,thiscanbeanexpensive operation:

it requires some level of global synchronisation. On the hardware, the operations that

provide this feature may be tightly linked to the hardware optimisations that they re-

strict, but i n programming languages, the analogous constructs can be more intuitive.

The speciﬁcation of such features represents one axis in the design space.

The language memory model can provide strong ordering of m emory accesses, or it

can allow reordering. Strongly ordered mem or y models like sequen t i al consistency (SC,

see §2 for more detail s) , where mem or y accesses are simply interleaved, have the adv antage

of usabili ty: programmers need not consider intricate interactions of relaxed concurrent

code — the most com p lex b eh aviour is simply for b i d d en . On the other hand, strong

models force the compiler to emit code that implements the strong ordering guaranteed

by the language. That may restrict the optimisations and force the introduction of explicit

synchronisation in the emitted binaries, with a substantialoverheadonmodernmulti-core

processors.

At the other end of the spectrum, languages can provide a very relaxed memory model

with the possibility of eﬃci ent implementation above relaxed processors, but this exposes

the programmer to additional complexity. If the guarantees abou t the ordering of m emor y

are too weak, then it can be impossible to build progr am s that implement reasonable

speciﬁcations. Relaxed m odels can i n cl u de expli ci t syn chronisation features that allow

the programmer to specify stronger ordering in parts of the program. This might take

the form of mutexes, fences, barriers, or the memory-order annotations present in the

example above. Given a relaxed model with these features, t h eprogrammerisburdened

with the delicate task of inserti n g enough explicit synchronisation to ensure correctness,

without introducing too much and spoiling performance.

Languages can provide a stronger model while maintaining eﬃcient implementability

by requiring a particular programming discipline. If programmers are required to avoid

certain patterns, then their absen ce becomes an invariant for optimisation within the

compiler. If a program fails to obey the discip l i ne, then the language provides weaker

guarantees about its behaviour.

All of these desi gn deci si on s rep r esent tradeoﬀs, and there isnouniversallysuperior

approach; memory models should be designed in sympathy with the expected use of the

programming language.

The C/C++ memory model This thesis focuses on the memor y model shared by C

and C++. Not o n l y are C and C++ extremel y well-used l a n gu a ges, but they represent

the state of the art in memory-mo del design for mainstream programming languages.

The C and C++ languages aspire to be portable, usable by regular programmers who

require an intuitive setting, and suitable for exp ert programmers writing high-performance

code. For portability, the language deﬁnes a m em ory model, and for performance that

model is relaxed.

The model i s stratiﬁ ed by the complexity of its features. In its simplest guise, the

memory model provides an intuitive setting for those who write single-threaded programs:

the order of memory accesses is similar to that provided by previous sequential versions of

the language. For programmers who want to write concurrent programs, there is extensive

support provided in the concurrency lib r ari es. This r anges from locks and unlocks, to the

atomics library, t hat provides a low-level high-performance interface to memory.

The C/C++11 memory model design was st rongly inﬂuen ced by the work of Adve,

Gharachorloo and Hill [

12, 10, 11], who p roposed a memory model whose programming

discipline dictates that programs must annotate memory accesses that might take part in

data races:twoaccessesondiﬀerentthreadsthatconcurrentlycontendonthesamepiece

of data. Following this work, in 2008 [37], Boehm and Adve described a simpliﬁ e d precur-

sor of the C/C++11 memory-model design, i m posing a similar programming discipline:

programmers must declare objects that might be accessed in a racy way, these objects

must be accessed only through the atomics library, and data races on all other objects

must be avoided. If this discipline is violated in any execution of the program, then every

execution has undeﬁned behaviour.Thisiscalleda“catch-ﬁresemantics”becausepro-

grams with undeﬁned behaviour are free to do anything — catch ﬁre, order a thousand

pizzas, email your resignation, and so on. This design choicecarriesaheavycosttotheus-

ability of the language. Suppose a programmer identiﬁes buggy beh aviour in part of their

program, and would like to debug their code. The program may bebehavingstrangely

because of a race in a completely diﬀerent part of the program,andthisracemaynot

even have been executed in th e buggy instance. Debugging suchaproblemcouldbevery

diﬃcult indeed. Note that this model of system programm ing does n ot mat ch practice,

where programmers try to understand racy programs in terms ofanassumedmodelof

the system comprising the compiler and the details of the underlying hardware. In this

(unsanctioned) model of the system it is possible to debug racy progr am s by observing

their behaviour, unlike in C/C++11.

Following earlier C++ design discussions [

38, 35], Boehm and Adve provided a cri-

teria under which programs executed in their relaxed memory model behave according

to sequential consistency [37], and this became a design goal of the C/C++11 memory

model: programs that do not have any un-annotated data races,andthatavoidusing

the lowest-level interface to memory, should execute in a sequentially consistent manner.

This provides programmers who do not need to use the h i gh est -performance features with

an intuitive memory model (for race-free programs). Th e guarantee went further, stating

that races can be calculated in the context of the sequentially-consistent memory model,

rather than in the far more complex setting of the relaxed memory model. This is a pow-

erful simpliﬁcation that allows som e programmers to be shielded from the full complexity

of the memory model, while experts have access to high-performance features. Although,

in early drafts of the C/C++11 standards, this laudable design goal was compromised

(details in Chapter

5), the ratiﬁed language does provide this guarantee, as we show in

Chapter 6.

The atomics library The atomics library provides versions of commonly used primitive

data structures, like ﬁxed-width integers, that can be used to write well-deﬁned racy co de.

Accessor fun ct i on s are used to r ead and write atomic variables. The C11 syntax for some

of t h ese is given below:

atomic load explicit(&x,memory order)

atomic store explicit(&x, v, memory order)

atomic compare exchange weak explicit(&x, &d, v, memory order, memory order)

The memory order ar gu m ent d eci d es how much ordering the access will create in an

execution. There are six choices of memory order:

memory

order seq cst,

memory order acq re l,

memory order acquire,

memory order release,

memory order consume,and

memory order relaxed.

This list is roughly in order, from strong to weak and expensive to cheap: mem-

ory

order seq cst can, under certain circumstances, provide sequentially-consistent

behaviour with a substantial cost to performance, whereas accesses given mem-

ory

order relaxed exhibit many relaxed behaviours, but enable one to wr it e very

high-performance code. Typical concurrent programmers should use the former, whose

behaviour is relatively straightforward, and expert programmers can use the whole gamut

of memory ord er s for ﬁne-grained control over the ordering ofmemoryaccesses. The

C/C++11 memory model allows a superset of the relaxed behaviour al l owed by its target

architectures. By choosin g stronger memory orders, one can forbid thi s relaxed behaviour.

1.1 Focus of this thesis

The C and C++ memory mod el s are d eﬁned by the International Standards Organi sa-

tion (ISO) in two lengthy standard documents [30, 8]. Prior to my work, th ere were

drafts describing the C/C++11 memory model, but those drafts, despite careful craftin g

by experts, were not known to describe a usable language memory model. The prose

speciﬁcations were untestable, and the model was not well understood. It was not for-

mally established whether the design was implementable, programmable, concise, or even

internally consistent, nor had the central design tenets, laid out earl y in the design pro-

cess [

38, 35] a n d reit er at ed by Boehm and Adve [37], been established.

In my work, I have sought to understand the C/C++11 memory model in formal

terms, to ﬁx parts that were broken, to prove that the design isusable,and,whereﬁxing

problems was not yet possible, to highlight outstanding issues. In this thesis I assess the

C/C++11 memory model design, presenting a clear and completepictureofamainstream

programming-language relaxed memory model. This eﬀort bothimprovedtheC/C++11

deﬁnition and can inform the design of future programming-language memor y models.

1.2 Contributions

Chapter

3 describes a formal version of the C/C++11 memory model that was developed

in close contact with the standardisation committee. Work onthismodelfedcorrections

back to the language speciﬁcation, and as a consequence, it isverycloselyintunewith

the intention of the committee, and the ratiﬁed prose speciﬁcat i on . The formal model is

written in the speciﬁcati on language Lem [

85, 90], and is readable, precise and executable

(the full deﬁnitions are provided in Appen dix C). The features of the model are introduced

in stages through a series of cut-down models that apply to programs that do not use all

of the language features. This chapter also presents a simpliﬁed model omits a redundant

part of the speciﬁcation. This work was developed in discussion with Scott Owens, Susmit

Sarkar, and Peter Sewell, but I played the leading role. It waspublishedinPOPLin

2011 [28].

Chapter

4 describes Cppmem,atoolthattakesverysmallprogramsandcalculates

all of th e behaviours allowed by the memory model. Cp p mem is joint work with Scott

Owens, Jean Pichon, Susmit Sarkar, and Peter Sewell. I contributed to the initial design

of the tool, and the tool uses an automati c OCaml t ran sl at ion of my formal mem or y

model produced by Lem. Cppmem is invaluable for exploring the behaviour of the mem-

ory m odel. It has been used for communication with the ISO standardisation committee,

for teaching the memory model to students, and by ARM, Linux andGCCengineerswho

wish to underst and C/C++11. Cppmem was described in POPL in 2011 [28], and an

alternative implementation of the backend that used the Ni tpick counterexample gener-

ator [31]waspublishedinPPDPin2011[32], in work by Weber and some of the other

authors.

Chapter 5 describes problems found with the standard during the process of for mal -

isation, together with solutions that I took to the C and C++ standardisation commit-

tees. Many amendments were adopted by both standards in some form. This achieve-

ment involved discussing problems and drafting text for amendments with both my aca-

demic collaborators and many on the standardisation committee, including: Hans Boehm,

Lawrence Crowl, Peter Dimov, Benjamin Kosnik, Nick Maclaren,PaulMcKenney,Clark

Nelson, Scott Owens, Susmit Sarkar, Peter Sewell, Tjark Weber, Anthony Williams, and

Michael Wong. Some of these problems broke the central precepts of the language de-

sign. My changes ﬁx these pr ob l ems and are now part of the ratiﬁed standards for C11

and C++11 [

30, 8], as well as the speciﬁcation of the GPU framework, OpenCL 2.0[86].

This chapter ends by identifying an open problem in the designofrelaxed-memorypro-

gramming languages, called the “thin-air” problem, that limits the compositionality of

speciﬁcations, and leaves some u n d esi r abl e executions allowed that will not appear in

practice. This leaves the memory model sound, but not as precise as we would like.

Many of t h e comments and criticisms were submitted as workingpapersanddefectre-

ports [29, 20, 75, 73, 111, 27, 76, 77, 74].

Chapter 6 describes a mechanised HOL4 proof that shows t h e equivalence of th e

progressively simpler versions of the C/C++11 memory model,includingthosepresented

in Chapter 3,undersuccessivelytighterrequirementsonprograms. These results establish

that a complicated part of the speciﬁcation is redundant and can simply be removed,

and they culminat e in the proof that the speciﬁcation meets one of its key design goals

(albeit for programs without loops or recursion): despite the model’s complexity, if a race-

free program uses only regular memory accesses, locks and seq cst-annotated atomic

accesses, then it will behave in a sequentially consistent manner. This proof validates

that the model is usable by programmers who understand sequential consistency.

Chapter

7 describes work done in collaboration with Jade Alglave, Luc Maranget,

Kayvan Memarian, Scott Owens, Susmit Sarkar, Peter Sewell, Tjark Weber and Derek

Williams. We took t he co mpilation mappings fro m C/C++11 to the x86, Power and

ARM architectures that had been proposed by the C++11 design group and proved that

they do indeed preserve the semantics of the programming-language memory model in

execution above t h ose processors. This led to the discovery and resolution of a ﬂaw in

one of the mappings. This chapter represents a second form of validation of the formal

model: it is implementable above common target architectures. My contribution, which

was smaller in this work, involved proving equivalent variants of the C/C++11 memory

剩余294页未读，继续阅读

YYhuyuming

粉丝: 12
资源: 3

C11和C++11并发模型：解锁多核处理器的挑战与编程策略

C++ Concurrency in Action 2nd.pdf

Fifth, create a GUI class with a main method. In this main method, create an anonymous class that implements the Runnable interface with a run method and use the javax.swing.SwingUtilities.invokeLater method to run that code on the event dispatch thread.

The database operation was expected to affect 1 row(s), but actually affected 0 row(s); data may have been modified or deleted since entities were loaded. See http://go.microsoft.com/fwlink/?LinkId=527962 for information on understanding and handling optimistic concurrency exceptions.

describe the difference between parallelism and concurrency in embedded system

guava.concurrency.level=20

concurrency in action, 2nd edition.pdf

c++ concurrency in action 2nd

c++ concurrency in action

C++ concurrency in action

public Executor

rust bootloader

stream.parallel()

c++ concurrency in action epub

qt开发进阶教程 .pdf

The dialog cannot be displaye in a thread whose concurrency model is multithread apartment

想要精通C++,有什么推荐的书籍吗

setEntryAt UNSAFE.putOrderedObject

hystrix metrics

最新资源