并行编程的挑战与应对策略 - RCU Paul 深入解析

需积分: 9 40 浏览量更新于2024-07-23 收藏 6.9MB PDF 举报

"Is Parallel programming hard - RCU Paul 大神的力作，深入探讨Linux内核中的并行编程技术，特别是RCU（Read-Copy-Update）机制" 在《Is Parallel Programming Hard, And, If So, What Can You Do About It?》这本书中，作者Paul E. McKenney，一位Linux技术中心的IBM专家，详细讨论了在Linux内核环境下进行并行编程所面临的挑战。这本书主要针对的是那些对并发和并行计算感兴趣的开发者，特别是对于需要处理多处理器系统和分布式系统中的同步问题的工程师。并行编程是现代计算机科学中的一个关键领域，随着多核处理器和分布式系统的普及，理解和掌握并行编程技术变得越来越重要。然而，正如书名所问，这是否真的困难？保罗·麦肯尼通过他的作品揭示了并行编程的复杂性，同时也提供了解决这些问题的策略和技巧。书中特别提到了RCU（Read-Copy-Update）机制，这是Linux内核中用于处理并发数据结构更新的一种高效方法。RCU设计的主要目标是在保证数据一致性的同时，尽可能减少锁的使用，从而提高系统性能。它允许读取操作无锁执行，但更新操作需要协调多个处理器，确保在读取者看到旧状态和新状态之间没有数据竞争。 RCU的核心思想是延迟释放，即在所有可能的读取者完成对旧数据的访问之后，才真正删除或更新数据。这种机制在处理大量并发读取和偶尔写入的场景中表现出色，因为它避免了读写冲突导致的性能瓶颈。此外，书中还可能涉及了其他并行编程的挑战，如死锁、竞态条件、活锁、资源饥饿等问题，以及如何通过使用适当的同步原语、设计并发算法和调试工具来解决这些问题。书中可能提供了实际案例分析，帮助读者理解如何在复杂的并发环境中应用这些理论知识。这本书不仅适合有经验的Linux内核开发者，也适合想要深入了解并行编程和RCU机制的软件工程师。通过学习，读者可以提升在高并发环境下的编程能力，更好地应对多处理器和分布式系统中的挑战。

4 CHAPTER 1. HOW TO USE THIS BOOK

1 git clone git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git

2 cd perfbook

3 make

4 evince perfbook.pdf & # Two-column version

5 make perfbook-1c.pdf

6 evince perfbook-1c.pdf & # One-column version for e-readers

Figure 1.1: Creating an Up-To-Date PDF

1 git remote update

2 git checkout origin/master

3 make

4 evince perfbook.pdf & # Two-column version

5 make perfbook-1c.pdf

6 evince perfbook-1c.pdf & # One-column version for e-readers

Figure 1.2: Generating an Updated PDF

The actual process of contributing patches and

sending

git pull

requests is similar to that of

the Linux kernel, which is documented in the

Documentation/SubmittingPatches

ﬁle in the

Linux source tree. One important requirement is that each

patch (or commit, in the case of a

git pull

request)

must contain a valid

Signed-off-by:

line, which has

the following format:

Signed-off-by: My Name <myname@example.org>

Please see

http://lkml.org/lkml/2007/

1/15/219

for an example patch containing a

Signed-off-by: line.

It is important to note that the

Signed-off-by:

line

has a very speciﬁc meaning, namely that you are certify-

ing that:

The contribution was created in whole or in part by

me and I have the right to submit it under the open

source license indicated in the ﬁle; or

The contribution is based upon previous work that, to

the best of my knowledge, is covered under an appro-

priate open source License and I have the right under

that license to submit that work with modiﬁcations,

whether created in whole or in part by me, under the

same open source license (unless I am permitted to

submit under a different license), as indicated in the

ﬁle; or

The contribution was provided directly to me by

some other person who certiﬁed (a), (b) or (c) and I

have not modiﬁed it.

The contribution is made free of any other party’s

intellectual property claims or rights.

I understand and agree that this project and the contri-

bution are public and that a record of the contribution

(including all personal information I submit with it,

including my sign-off) is maintained indeﬁnitely and

may be redistributed consistent with this project or

the open source license(s) involved.

This is similar to the Developer’s Certiﬁcate of Origin

(DCO) 1.1 used by the Linux kernel. The only addition is

item #4. This added item says that you wrote the contri-

bution yourself, as opposed to having (say) copied it from

somewhere. If multiple people authored a contribution,

each should have a Signed-off-by: line.

You must use your real name: I unfortunately cannot

accept pseudonymous or anonymous contributions.

The language of this book is American English, how-

ever, the open-source nature of this book permits transla-

tions, and I personally encourage them. The open-source

licenses covering this book additionally allow you to sell

your translation, if you wish. I do request that you send

me a copy of the translation (hardcopy if available), but

this is a request made as a professional courtesy, and is

not in any way a prerequisite to the permission that you

already have under the Creative Commons and GPL li-

censes. Please see the

FAQ.txt

ﬁle in the source tree

for a list of translations currently in progress. I consider

a translation effort to be “in progress” once at least one

chapter has been fully translated.

As noted at the beginning of this section, I am this

book’s editor. However, if you choose to contribute, it

will be your book as well. With that, I offer you Chapter 2,

our introduction.

Chapter 2

Introduction

Parallel programming has earned a reputation as one

of the most difﬁcult areas a hacker can tackle. Papers and

textbooks warn of the perils of deadlock, livelock, race

conditions, non-determinism, Amdahl’s-Law limits to

scaling, and excessive realtime latencies. And these perils

are quite real; we authors have accumulated uncounted

years of experience dealing with them, and all of the

emotional scars, grey hairs, and hair loss that go with

such experiences.

However, new technologies that are difﬁcult to use

at introduction invariably become easier over time. For

example, the once-rare ability to drive a car is now com-

monplace in many countries. This dramatic change came

about for two basic reasons: (1) cars became cheaper and

more readily available, so that more people had the op-

portunity to learn to drive, and (2) cars became easier to

operate due to automatic transmissions, automatic chokes,

automatic starters, greatly improved reliability, and a host

of other technological improvements.

The same is true of a host of other technologies, in-

cluding computers. It is no longer necessary to operate a

keypunch in order to program. Spreadsheets allow most

non-programmers to get results from their computers that

would have required a team of specialists a few decades

ago. Perhaps the most compelling example is web-surﬁng

and content creation, which since the early 2000s has

been easily done by untrained, uneducated people using

various now-commonplace social-networking tools. As

recently as 1968, such content creation was a far-out re-

search project [

Eng68

], described at the time as “like a

UFO landing on the White House lawn”[Gri00].

Therefore, if you wish to argue that parallel program-

ming will remain as difﬁcult as it is currently perceived

by many to be, it is you who bears the burden of proof,

keeping in mind the many centuries of counter-examples

in a variety of ﬁelds of endeavor.

2.1 Historic Parallel Programming

Difﬁculties

As indicated by its title, this book takes a different ap-

proach. Rather than complain about the difﬁculty of par-

allel programming, it instead examines the reasons why

parallel programming is difﬁcult, and then works to help

the reader to overcome these difﬁculties. As will be seen,

these difﬁculties have fallen into several categories, in-

cluding:

The historic high cost and relative rarity of parallel

systems.

The typical researcher’s and practitioner’s lack of

experience with parallel systems.

3. The paucity of publicly accessible parallel code.

The lack of a widely understood engineering disci-

pline of parallel programming.

The high overhead of communication relative to

that of processing, even in tightly coupled shared-

memory computers.

Many of these historic difﬁculties are well on the way

to being overcome. First, over the past few decades, the

cost of parallel systems has decreased from many mul-

tiples of that of a house to a fraction of that of a bicy-

cle, courtesy of Moore’s Law. Papers calling out the

advantages of multicore CPUs were published as early

as 1996 [

ONH

]. IBM introduced simultaneous multi-

threading into its high-end POWER family in 2000, and

multicore in 2001. Intel introduced hyperthreading into

its commodity Pentium line in November 2000, and both

AMD and Intel introduced dual-core CPUs in 2005. Sun

followed with the multicore/multi-threaded Niagara in

6 CHAPTER 2. INTRODUCTION

late 2005. In fact, by 2008, it was becoming difﬁcult to

ﬁnd a single-CPU desktop system, with single-core CPUs

being relegated to netbooks and embedded devices. By

2012, even smartphones were starting to sport multiple

CPUs.

Second, the advent of low-cost and readily available

multicore systems means that the once-rare experience

of parallel programming is now available to almost all

researchers and practitioners. In fact, parallel systems

are now well within the budget of students and hobbyists.

We can therefore expect greatly increased levels of inven-

tion and innovation surrounding parallel systems, and that

increased familiarity will over time make the once pro-

hibitively expensive ﬁeld of parallel programming much

more friendly and commonplace.

Third, in the 20

century, large systems of highly par-

allel software were almost always closely guarded propri-

etary secrets. In happy contrast, the 21

century has seen

numerous open-source (and thus publicly available) paral-

lel software projects, including the Linux kernel [

Tor03c

database systems [

Pos08

MS08

], and message-passing

systems [The08, UoC08]. This book will draw primarily

from the Linux kernel, but will provide much material

suitable for user-level applications.

Fourth, even though the large-scale parallel-

programming projects of the 1980s and 1990s were

almost all proprietary projects, these projects have

seeded the community with a cadre of developers who

understand the engineering discipline required to develop

production-quality parallel code. A major purpose of this

book is to present this engineering discipline.

Unfortunately, the ﬁfth difﬁculty, the high cost of com-

munication relative to that of processing, remains largely

in force. Although this difﬁculty has been receiving in-

creasing attention during the new millennium, according

to Stephen Hawking, the ﬁnite speed of light and the

atomic nature of matter is likely to limit progress in this

area [

Gar07

Moo03

]. Fortunately, this difﬁculty has been

in force since the late 1980s, so that the aforementioned

engineering discipline has evolved practical and effective

strategies for handling it. In addition, hardware designers

are increasingly aware of these issues, so perhaps future

hardware will be more friendly to parallel software as

discussed in Section 3.3.

Quick Quiz 2.1:

Come on now!!! Parallel program-

ming has been known to be exceedingly hard for many

decades. You seem to be hinting that it is not so hard.

What sort of game are you playing?

However, even though parallel programming might not

be as hard as is commonly advertised, it is often more

work than is sequential programming.

Quick Quiz 2.2:

How could parallel programming

ever be as easy as sequential programming?

It therefore makes sense to consider alternatives to

parallel programming. However, it is not possible to

reasonably consider parallel-programming alternatives

without understanding parallel-programming goals. This

topic is addressed in the next section.

2.2 Parallel Programming Goals

The three major goals of parallel programming (over and

above those of sequential programming) are as follows:

1. Performance.

2. Productivity.

3. Generality.

Quick Quiz 2.3:

Oh, really??? What about correct-

ness, maintainability, robustness, and so on?

Quick Quiz 2.4:

And if correctness, maintainability,

and robustness don’t make the list, why do productivity

and generality?

Quick Quiz 2.5:

Given that parallel programs are

much harder to prove correct than are sequential pro-

grams, again, shouldn’t correctness really be on the list?

Quick Quiz 2.6: What about just having fun?

Each of these goals is elaborated upon in the following

sections.

2.2.1 Performance

Performance is the primary goal behind most parallel-

programming effort. After all, if performance is not a

concern, why not do yourself a favor: Just write sequential

code, and be happy? It will very likely be easier and you

will probably get done much more quickly.

Quick Quiz 2.7:

Are there no cases where parallel

programming is about something other than performance?

Note that “performance” is interpreted quite broadly

here, including scalability (performance per CPU) and

efﬁciency (for example, performance per watt).

That said, the focus of performance has shifted from

hardware to parallel software. This change in focus is due

2.2. PARALLEL PROGRAMMING GOALS 7

0.1

100

1000

10000

1975

1980

1985

1990

1995

2000

2005

2010

2015

CPU Clock Frequency / MIPS

Year

Figure 2.1: MIPS/Clock-Frequency Trend for Intel CPUs

to the fact that, although Moore’s Law continues to deliver

increases in transistor density, it has ceased to provide the

traditional single-threaded performance increases. This

can be seen in Figure 2.1.

, which shows that writing

single-threaded code and simply waiting a year or two for

the CPUs to catch up may no longer be an option. Given

the recent trends on the part of all major manufacturers

towards multicore/multithreaded systems, parallelism is

the way to go for those wanting the avail themselves of

the full performance of their systems.

Even so, the ﬁrst goal is performance rather than scal-

ability, especially given that the easiest way to attain

linear scalability is to reduce the performance of each

CPU [

Tor01

]. Given a four-CPU system, which would

you prefer? A program that provides 100 transactions per

second on a single CPU, but does not scale at all? Or a

program that provides 10 transactions per second on a

single CPU, but scales perfectly? The ﬁrst program seems

like a better bet, though the answer might change if you

happened to have a 32-CPU system.

This plot shows clock frequencies for newer CPUs theoretically ca-

pable of retiring one or more instructions per clock, and MIPS (millions

of instructions per second, usually from the old Dhrystone benchmark)

for older CPUs requiring multiple clocks to execute even the simplest in-

struction. The reason for shifting between these two measures is that the

newer CPUs’ ability to retire multiple instructions per clock is typically

limited by memory-system performance. Furthermore, the benchmarks

commonly used on the older CPUs are obsolete, and it is difﬁcult to

run the newer benchmarks on systems containing the old CPUs, in part

because it is hard to ﬁnd working instances of the old CPUs.

That said, just because you have multiple CPUs is not

necessarily in and of itself a reason to use them all, espe-

cially given the recent decreases in price of multi-CPU

systems. The key point to understand is that parallel pro-

gramming is primarily a performance optimization, and,

as such, it is one potential optimization of many. If your

program is fast enough as currently written, there is no rea-

son to optimize, either by parallelizing it or by applying

any of a number of potential sequential optimizations.

By the same token, if you are looking to apply parallelism

as an optimization to a sequential program, then you will

need to compare parallel algorithms to the best sequential

algorithms. This may require some care, as far too many

publications ignore the sequential case when analyzing

the performance of parallel algorithms.

2.2.2 Productivity

Quick Quiz 2.8:

Why all this prattling on about non-

technical issues??? And not just any non-technical issue,

but productivity of all things? Who cares?

Productivity has been becoming increasingly important

in recent decades. To see this, consider that the price of

early computers was tens of millions of dollars at a time

when engineering salaries were but a few thousand dollars

a year. If dedicating a team of ten engineers to such a

machine would improve its performance, even by only

10%, then their salaries would be repaid many times over.

One such machine was the CSIRAC, the oldest still-

intact stored-program computer, which was put into op-

eration in 1949 [

Mus04

Mel06

]. Because this machine

was built before the transistor era, it was constructed of

2,000 vacuum tubes, ran with a clock frequency of 1kHz,

consumed 30kW of power, and weighed more than three

metric tons. Given that this machine had but 768 words

of RAM, it is safe to say that it did not suffer from the

productivity issues that often plague today’s large-scale

software projects.

Today, it would be quite difﬁcult to purchase a machine

with so little computing power. Perhaps the closest equiv-

alents are 8-bit embedded microprocessors exempliﬁed

by the venerable Z80 [

Wik08

], but even the old Z80 had

a CPU clock frequency more than 1,000 times faster than

the CSIRAC. The Z80 CPU had 8,500 transistors, and

could be purchased in 2008 for less than $2 US per unit

in 1,000-unit quantities. In stark contrast to the CSIRAC,

Of course, if you are a hobbyist whose primary interest is writing

parallel software, that is more than enough reason to parallelize whatever

software you are interested in.

8 CHAPTER 2. INTRODUCTION

0.1

100

1000

10000

100000

1975

1980

1985

1990

1995

2000

2005

2010

2015

MIPS per Die

Year

Figure 2.2: MIPS per Die for Intel CPUs

software-development costs are anything but insigniﬁcant

for the Z80.

The CSIRAC and the Z80 are two points in a long-term

trend, as can be seen in Figure 2.2. This ﬁgure plots an

approximation to computational power per die over the

past three decades, showing a consistent four-order-of-

magnitude increase. Note that the advent of multicore

CPUs has permitted this increase to continue unabated

despite the clock-frequency wall encountered in 2003.

One of the inescapable consequences of the rapid de-

crease in the cost of hardware is that software productivity

becomes increasingly important. It is no longer sufﬁcient

merely to make efﬁcient use of the hardware: It is now

necessary to make extremely efﬁcient use of software

developers as well. This has long been the case for se-

quential hardware, but parallel hardware has become a

low-cost commodity only recently. Therefore, only re-

cently has high productivity become critically important

when creating parallel software.

Quick Quiz 2.9:

Given how cheap parallel systems

have become, how can anyone afford to pay people to

program them?

Perhaps at one time, the sole purpose of parallel soft-

ware was performance. Now, however, productivity is

gaining the spotlight.

2.2.3 Generality

One way to justify the high cost of developing parallel

software is to strive for maximal generality. All else being

equal, the cost of a more-general software artifact can be

spread over more users than that of a less-general one.

Unfortunately, generality often comes at the cost of per-

formance, productivity, or both. To see this, consider the

following popular parallel programming environments:

C/C++ “Locking Plus Threads”

: This category,

which includes POSIX Threads (pthreads) [

Ope97

Windows Threads, and numerous operating-system

kernel environments, offers excellent performance

(at least within the conﬁnes of a single SMP system)

and also offers good generality. Pity about the

relatively low productivity.

Java

: This general purpose and inherently multithreaded

programming environment is widely believed to of-

fer much higher productivity than C or C++, courtesy

of the automatic garbage collector and the rich set

of class libraries. However, its performance, though

greatly improved in the early 2000s, lags that of C

and C++.

MPI

: This Message Passing Interface [

MPI08

] powers

the largest scientiﬁc and technical computing clus-

ters in the world and offers unparalleled performance

and scalability. In theory, it is general purpose, but

it is mainly used for scientiﬁc and technical com-

puting. Its productivity is believed by many to be

even lower than that of C/C++ “locking plus threads”

environments.

OpenMP

: This set of compiler directives can be used to

parallelize loops. It is thus quite speciﬁc to this task,

and this speciﬁcity often limits its performance. It

is, however, much easier to use than MPI or C/C++

“locking plus threads.”

SQL

: Structured Query Language [

Int92

] is speciﬁc

to relational database queries. However, its perfor-

mance is quite good as measured by the Transaction

Processing Performance Council (TPC) benchmark

results [

Tra01

]. Productivity is excellent; in fact, this

parallel programming environment enables people to

make good use of a large parallel system despite hav-

ing little or no knowledge of parallel programming

concepts.

剩余520页未读，继续阅读

cooldingjia

粉丝: 0
资源: 2

并行编程的挑战与应对策略 - RCU Paul 深入解析

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is.Parallel.Programming.Hard

Is Parallel Programming Hard [v20191222a # Paul E. McKenney].pdf

Is Parallel Programming Hard And If So What Can You Do About It

Is Parallel Programming Hard, And, If So, What Can You.pd

Is Parallel Programming Hard, And, If So, What Can You Do About

Is Parallel Programming Hard, And, If So, What Can You Do About It

parallel_programming_Chinese:翻译并行编程的一本书

Reactive Android Programming

Reactive Android Programming PDF

最新资源