深入解析Java并发编程：实战指南

需积分: 1 72 浏览量更新于2024-07-18 收藏 3.89MB PDF 举报

“深入理解Java并发编程的好文章，探讨并发编程的实际机制，适合高级开发者阅读。” 在Java并发编程中，理解并掌握多线程环境下的工作原理是至关重要的。并发编程可以充分利用多核处理器的能力，提高应用程序的性能。然而，由于其复杂性，也可能带来诸多问题，如线程安全、死锁、竞态条件等。本文可能详细阐述了Java并发的基础概念，包括线程的创建、管理以及同步机制。首先，Java中通过`Thread`类或者实现`Runnable`接口来创建线程。线程的启动、暂停、恢复和停止都是通过相应的方法进行控制，而线程的执行顺序则依赖于操作系统的调度策略。其次，为了保证数据一致性，Java提供了多种同步机制。包括`synchronized`关键字，它可以保证同一时间只有一个线程访问特定的代码块或方法，防止竞态条件的发生。此外，`java.util.concurrent`包提供了更高级的并发工具，如`Semaphore`用于限制同时访问的线程数量，`CyclicBarrier`和`CountDownLatch`用于线程间的协作。再者，Java内存模型（JMM）定义了线程之间的共享变量如何交互和可见性。`volatile`关键字可以确保变量的修改对其他线程立即可见，但不能保证原子性。`Atomic`类提供了一种原子操作，可以在不使用`synchronized`的情况下保证线程安全。另外，死锁是并发编程中的一个常见问题。当两个或更多线程相互等待对方释放资源时，就会发生死锁。避免死锁的关键在于正确设计线程资源获取的顺序，以及使用超时和死锁检测机制。最后，文章可能还提到了Java并发工具类如`ExecutorService`和`ThreadPoolExecutor`，它们可以帮助我们更好地管理和控制线程池，有效地调度和复用线程，从而提高系统的并发性能和资源利用率。深入理解Java并发编程不仅可以帮助开发者编写出高效的多线程程序，还能避免因并发问题导致的系统不稳定。这篇好文可能通过深入浅出的方式，让读者了解并发背后的原理和最佳实践，对于提升Java并发编程技能大有裨益。

4 CHAPTER 1. INTRODUCTION

0.1

100

1000

10000

100000

1975

1980

1985

1990

1995

2000

2005

2010

2015

MIPS per Die

Year

Figure 1.2: MIPS per Die for Intel CPUs

die over the past three decades, showing a consistent

four-order-of-magnitude increase . Note that the ad-

vent of multicore CPUs has permitted this increase

to continue unabated despite the clock-frequency

wall encountered in 2003.

One of the inescapable consequences of the rapid

decrease in the cost of hardware is that software

productivity grows in cr e asin gly important. It is no

longer suﬃcient merely to make eﬃcient use of the

hardware, it is now also neces s ary to make extremely

eﬃcient u s e of software developers. This has long

been the case for sequential hardware, but only re-

cently has parallel hardware be come a low-cost com-

modity. Therefore, the need for high productivity in

creating parallel software has only recently become

hugely important.

Quick Quiz 1.9: Given how cheap parallel hard-

ware has become, how can anyone aﬀord to pay peo-

ple to program it?

Perhaps at one time, the sole purpose of parallel

software was performance. Now, however, produc-

tivity is increasingly important.

1.2.3 Generality

One way to justify the high cost of developing par-

allel software is to st r ive for maximal generality. All

else being equal, the cost of a more-general software

artifact can be spread over more users than can a

less-general artifact.

Unfortunately, generality often comes at the cost

of pe r for mance , productivity, or both. To see this,

consider the following popular parallel programming

environments:

C/C++ “Locking Plus Threads” : This

category, which includes POSIX Threads

(pthreads) [Ope97], Windows Threads, and nu-

merous operating-system kernel environments,

oﬀers excellent perfor mance (at least within

the conﬁnes of a single SMP system) and also

oﬀers good generality. Pity about the relatively

low productivity.

Java : This progr amming environment, which is in-

herently multithreaded, is widely believed to be

much more productive than C or C++, cour-

tesy of the automatic garbage collector and the

rich set of class librar ies , and is reasonably gen-

eral purpose. However, its performance, though

greatly improved over the past ten years, is gen-

erally considered to be less than that of C and

C++.

MPI : this message-passing interface [MPI08] pow-

ers the largest scientiﬁc and technical comput-

ing clusters in the world, so oﬀers unparalleled

performance and scalability. It is in theory gen-

eral pu r pose, but has generally been used for

scientiﬁc and te chnical computing. It produc-

tivity is believed by many to be even less than

that of C/C++ “locking plus threads” environ-

ments.

OpenMP : this set of compiler directives can be

used to parallelize loops. It is thus quite speciﬁc

to this task, and this speciﬁcity often limits its

performance, It is, however, much easier to use

than MPI or parallel C/C++.

SQL : structured query language [Int92] is ex-

tremely speciﬁc, applying only to re lation al

database queries. However, its performance

is quite good, doing quite well in Transaction

Processing Performance Council (TPC) bench-

marks [Tra01]. Product iv ity is excellent, in fact,

this parallel programming environment permits

people who know almost nothing about paral-

lel p r ogramming to make good use of a large

parallel machine.

The nirvana of parallel programming environ-

ments, one that oﬀers world-class perfor mance, pro-

ductivity, and generality, simply does not yet exist.

Until such a nirvana appears, it will be necessary

to make engineering tradeoﬀs among performance,

productivity, and generality. One s u ch tradeoﬀ is

1.3. ALTERNATIVES TO PARALLEL PROGRAMMING 5

Application

Middleware (e.g., DBMS)

System Libraries

Operating System Kernel

Firmware

Hardware

Productivity

Performance

Generality

Figure 1.3: Software Layers and Performance, Pro-

ductivity, and Generality

shown in Figure 1.3, which shows how productivity

becomes increasingly important at the upper lay-

ers of the system stack, while performance and gen-

erality b ec ome increasingly important at the lower

layers of the system stack. The huge development

costs incurred near the bottom of the stack must be

spread over equally huge numbers of users on the one

hand (hence the importance of generality), and per-

formance lost near the bottom of the stack cannot

easily be recovered further up the stack. Near the

top of t he stack, there might be very few users for

a given speciﬁc application, in which case produc-

tivity concerns are paramount. This explains the

tendency towards “bloatware” further up the stack:

extra hardware is often cheaper than would be the

extra developers. This book is intended primarily

for developers working near the bottom of the stack,

where performance and generality are paramount

concerns.

It is important to note that a tradeoﬀ between

productivity and generality has existed for centuries

in many ﬁelds. For but one example, a nailgun is

far more productive than is a hammer, but in con-

trast to the nailgun, a hammer can be used for many

things besides driving nails. It should therefore be

absolutely no surprise to see similar tradeoﬀs ap-

pear in the ﬁeld of parallel computing. This tradeoﬀ

is shown schematically in Figure 1.4. Here, Users 1,

2, 3, and 4 have speciﬁc jobs that they need t he

computer to help them with. The most productive

possible language or environment for a given user is

one that simply does that user’s job, without requir-

ing any programming, conﬁguration, or other setup.

Quick Quiz 1.10: This is a ridiculously un-

achievable ideal!!! Why not focus on something that

is achievable in practice?

User 2

User 3

User 4

User 1

General−Purpose

Environment

for User 1

Env Productive

Special−Purpose

Special−Purpose Environment

Productive for User 3

Special−Purpose

Environment

Productive for User 4

Productive for User 2

Environment

HW /

Abs

Figure 1.4: Tradeoﬀ Between Productivity and G en -

erality

Unfortunately, a system that does the job required

by user 1 is unlikely to do user 2’s job. In other

words, the most productive languages and environ-

ments are domain-speciﬁc, and thus by deﬁnition

lacking generali ty.

Another option is to tailor a given programming

language or environment to the hardware system (for

example, low-level languages such as assembly, C,

C++, or Java) or to some abstraction (for example,

Haskell, Prolog, or Snobol), as is shown by th e cir c u-

lar region near th e center of Figure 1.4. These lan-

guages can be considered t o be general in the sense

that they are e qu ally ill-suited to the jobs required

by users 1, 2, 3, and 4. In other words, their general-

ity is purchased at the expense of decreased produc-

tivity when compared to domain-s peciﬁc languages

and environments.

With the three often-conﬂicting parallel-

programming goals of performance, pr oductivity,

and gene r ality in mind, it is now time to look into

avoiding the se conﬂicts by cons id er i ng alternatives

to parallel programming.

1.3 Alternatives to Parallel

Programming

In order to properly consider alternatives to parallel

programming, you must ﬁrst have thought through

what you expect the parallelism to do for you. As

seen in Section 1.2, the pr i mary goals of parallel pro-

gramming are performance, productivity, and gen er -

ality.

Although historically most parallel developers

might be most concerned with the ﬁrst goal, one ad-

6 CHAPTER 1. INTRODUCTION

vantage of the other goals is that they relieve you of

the need to justify using parallelism. The remainder

of this section is concerned only performance im-

provement.

It is important to keep in mind that parallelism

is but one way to improve performance. Other well-

known approaches include the following, in roughly

increasing order of diﬃculty:

1. Run multiple instances of a sequential applica-

tion.

2. Construct the application to make use of exist-

ing parallel software.

3. Apply performance optimization to the serial

application.

1.3.1 Multiple Instances of a Sequen-

tial Application

Running multiple instanc es of a sequential app lic a-

tion can allow you to do parallel programming with-

out actually doing parallel programming. There are

a large number of ways to approach this, depending

on the structure of the application.

If your program is analyzing a large number of

diﬀerent scenarios, or is analyzing a large number

of independent data sets, one easy and eﬀective ap-

proach is to create a single sequential program that

carries out a single analysis, then use any of a num-

ber of scripting enviroments (for example the bash

shell) to run a number of instances of this sequential

program in parallel. In some cases, thi s approach

can be easily extended to a cluste r of machines.

This approach may seem like cheating, and in fact

some denigrate such programs “embarrassingly par-

allel”. And in fact, this approach does have some

potential disadvantages, including increased mem-

ory consu mpt ion, waste of CPU cycles recomputing

common intermediate results, and increased copying

of data. However, it is often extremely eﬀective, gar-

nering extreme performance gains with lit tl e or no

added eﬀort.

1.3.2 Make Use of Existing Parallel

Software

There is no longer any shortage of parallel soft-

ware environments that can present a single-

threaded programming environment, including rela-

tional databases, web-application servers, and map-

reduce environments. For example, a common de-

sign provides a separate program for each user, each

of which generates SQL that is run concurrently

against a common relational database. The per-user

programs are responsible only for the user interface,

with the relational database taking full responsbility

for the diﬃcult issues surrounding parallelism and

persistence.

Taking this approach often sacriﬁces some perfor-

mance, at least when compared to carefully hand-

coding a fully parallel application. However, such

sacriﬁce is often justiﬁed given the great reduction

in development eﬀort required.

1.3.3 Performance Optimization

Up th r ough the early 2000s, CPU performance was

doubling every 18 months. In such an environment,

it is often much more import ant to create new func-

tionality than to do careful performance optimiza-

tion. Now that Moore’s Law is “only” in cr e asin g

transistor density instead of increasing both transis-

tor density and per-trans is tor performance, it might

be a good time to rethink the importance of perfor-

mance optimization.

After all, performance optimization can reduce

power consumption as well as increasing perfor-

mance.

From this viewpoint, parallel programming is but

another performance optimization, albeit one that is

becoming much more attractive as parallel systems

become cheaper and more r eadi ly available. How-

ever, it is wise to keep in mind that the speedup

available from parallelism is limited to roughly the

number of CPUs, while the speed up potentially

available from straight software optimization can be

multiple orde r s of magnitude.

Furthermore, diﬀerent programs might have dif-

ferent perfor manc e bottlenecks. Parallel program-

ming will only help with some b ottle ne cks. For ex-

ample, suppose that your program sp en ds most of

its time waiting on data from your disk drive. In

this case, making your program use multiple CPUs

is not likely to gain much performance. In fact, if

the program was reading from a large ﬁle laid out se-

quentially on a rotating disk, parallelizing your pro-

gram might well make it a lot slower. You should

instead add more disk drives, optimize the data so

that the ﬁle can be smaller (thus faster to read), or,

if possible, avoid the need to re ad quite so much of

the data.

Quick Quiz 1.11: What other bottlenecks might

prevent additional CPUs from pr oviding additional

performance?

Parallelism can be a powerful optimization tech-

nique, but it is not the only such technique, nor is it

appropriate for all situations. Of course, the easier

1.4. WHAT MAKES PARALLEL PROGRAMMING HARD? 7

Partitioning

Work

Access Control

Parallel

With Hardware

Interacting

Performance Productivity

Generality

Resource

Partitioning and

Replication

Figure 1.5: Categories of Tasks Required of Parallel

Programmers

it is to parallelize your pr ogram, the more attractive

parallelization becomes as an optimi zati on. Paral-

lelization has a reputation of being quite diﬃcult,

which leads to the question “exactly what makes

parallel programming so diﬃcult?”

1.4 What Makes Parallel Pro-

gramming Hard?

It is important to note that the diﬃculty of paral-

lel programming is as much a human-factors issue

as it is a set of technical properties of the parallel

programming p r oble m. This is the case because we

need human beings to be able to tell parallel sys-

tems what to do, and this two-way communication

between human and computer is as much a function

of the human as it is of the computer. Therefore,

appeals to abstractions or to mathematical analyses

will necessarily be of severely limited utility.

In the Industrial Revolution, the interface between

human and machine was evaluate d by human-factor

studies, then called time-and-motion studies. Al-

though there have been a few human-factor stud-

ies examining parall el programming [ENS05, ES05,

HCS

05, SS94], these studies have been extremely

narrowly focused, and hence u nabl e to d emons t r ate

any general results. Furthermore, given that the nor-

mal range of programmer pro du ct iv ity spans more

than an order of magnitude, it is unrealistic to ex-

pect an aﬀordable study to be capable of detect-

ing (say) a 10% diﬀerence in productivity. Al-

though the multiple-order-of-magnitude diﬀerences

that s uch studies can reliably detect are extremely

valuable, the most impressive improvements tend to

be based on a long ser i es of 10% impr ovements.

We must therefore take a diﬀere nt approach.

One such approach is to carefully consider what

tasks that parallel programmers must undertake

that are not requir ed of se qu e ntial programmers. We

can then evaluate how well a given programming

language or environment assists the developer with

these tasks. These tasks fall into the four categories

shown in Figure 1.5, each of which is covered in the

following secti ons .

1.4.1 Work Partitioning

Work par ti ti onin g is absolutely required for parallel

execution: if there is but one “glob” of work, then

it can be executed by at most one CPU at a time,

which is by deﬁnition sequential execution. How-

ever, partitioning the code requires great care. For

example, uneven partitioning can result in sequen-

tial execution once t he small partitions have com-

pleted [Amd67]. In less extreme cases, load balanc-

ing can be used to fu lly utilize available hardware,

thus attaining more-optimal performance.

In addition, partitioning of work can complicate

handling of global errors and events: a parallel pro-

gram may need to carry out non-tri vi al synchroniza-

tion in order to safely process such global events.

Each partition requires some sort of communica-

tion: after all, if a given thread did not communicate

at all, it would have no eﬀect and would thus not

need to be execute d. However, because communi-

cation incurs overhead, careless partitioning choices

can result in severe performance degradation.

Furthermore, the number of concurrent threads

must often be controlled, as each such thread occ u-

pies common resources, for example, space in CPU

caches. If too many threads are permitted to execute

concurrently, the CPU caches will overﬂow, result-

ing in high cache miss rate, which in turn degrades

performance. On the other hand, large numbers of

threads are often required to overlap computation

and I/O.

Quick Quiz 1.12: What besides CPU cache ca-

pacity might require limiting the number of concur-

rent threads?

Finally, permitting threads to execute concur-

rently greatly increases the program’s state s pace ,

which can make the program diﬃcult to understand,

degrading productivity. All else being equal, smaller

state spaces having more regular structure are more

easily understood, but th is is a human-factors state-

ment as opposed to a technical or mathematical

statement. Good parallel designs might have ex-

tremely large state spaces, but nevertheless be easy

to under st and due to their regular structure, while

poor designs can be impenetrable despite h aving a

8 CHAPTER 1. INTRODUCTION

comparatively small state space. The best designs

exploit embarrassing parallelism, or transfor m the

problem to one having an embarrassingly parallel so-

lution. In either case, “embarrassingly parallel” is in

fact an embarrassment of riches. The current state

of the art enumerates good designs; more work is

required to make more general judgements on state-

space size and structure.

1.4.2 Parallel Access Control

Given a sequential program with only a single

thread, that single thread has full access to all of

the program’s resources. These resources are most

often in-memory data structures, but can be CPUs,

memory (includ in g caches), I/O devices, computa-

tional accelerators, ﬁles, and much else besides.

The ﬁrst parallel-access-control issue is whether

the form of the access to a given re s our ce depends

on that resource’s location. For example, in many

message-passing environments, local-variable access

is via expressions and assignments, while remote-

variable access uses an entirly diﬀerent syntax,

usually involving messaging. The POSIX threads

environment [Ope97], Structured Q ue r y Language

(SQL) [I nt92], and partitioned global address-space

(PGAS) environments such as Universal Parallel C

(UPC) [EGCD03] oﬀer implicit access, while Mes-

sage Passing Interface (MPI) [M PI08] oﬀers explicit

access because access to re mote data requires ex-

plicit messaging.

The other parallel access-control issue is how

threads coordinate access to the resource s . This

coordination is carri ed out by the very large num-

ber of synchronization mechanisms provided by var-

ious parallel languages and environments, includ-

ing message passing, locking, transactions, reference

counting, ex pl ici t timin g, shared atomic variables,

and data ownership. Many traditional parallel-

programming concerns such as deadlock, livelock,

and transaction rollback stem from th is coordina-

tion. This framework can be elaborated to in-

clude comparisions of these s yn chronization mech-

anisms, for ex ample locking vs. transactional mem-

ory [MMW07], but such elaboration is b e yond the

scope of this section.

1.4.3 Resource Partitioning and

Replication

The most eﬀective parallel algorithms and sys t ems

exploit resource parallelism, so much so that it is

usually wise to begin parallelization by partition-

ing your write-intensive resources and replicating

frequently accessed read-mostly resources. The re-

source in question is most frequently data, which

might be partitioned over computer systems, mass-

storage devices, NUMA n odes, CPU cores (or dies

or hardware threads), pages, cache lines, instances

of synchronization primitives, or critical sections of

code. For example, partitioning over locking primi-

tives is termed “data locking” [BK85].

Resource partitioning is frequently application de-

pendent, for example, numerical applications fre-

quently partition matrices by row, column, or sub-

matrix, while commercial applications frequently

partition write-intensive data structur es and repli-

cate read-mostly data structures. For example, a

commercial application might assign the data for a

given customer to a given few computer systems out

of a large cluster. An application might statically

partition data, or dynamically change the partition-

ing over time.

Resource partitioning is extremely eﬀective, but

it can be quite challenging for complex multilinked

data structures.

1.4.4 Interacting With Hardware

Hardware interaction is n or mally the domain of the

operating system, the compiler, libraries, or other

software-environment infrastructure. However, de-

velopers working with novel hardware features and

components will often need to work directly with

such hardware. In addition, direct access to the

hardware can b e required when squeezing the last

drop of performance out of a given syste m. In this

case, the developer may need to tailor or conﬁgure

the application to the cache geometry, system topol-

ogy, or interconnect protocol of the target hardware.

In some cases, hardware may be considered to be

a resource which may be subject to partitioning or

access control, as descr ibed in the previous sections.

1.4.5 Composite Capabilities

Although these four capabilities are fundamental,

good engineering practice uses composites of these

capabilities. For ex ample , the data-parallel ap-

proach ﬁrst partitions the data so as to minimize

the need for inter-partition communication, par ti-

tions the code accordingly, and ﬁnally maps data

partitions and threads so as to maximize through-

put while minimizing inter-thread communication.

The developer can then consider each partition sepa-

rately, greatly reducing the size of the relevant state

space, in turn i nc r eas ing productivity. Of course,

some problems are non-partitionable but on the

剩余357页未读，继续阅读

wenbo31_newlife

粉丝: 0
资源: 7

深入解析Java并发编程：实战指南

java并发编程实战源码,java并发编程实战pdf,Java

Java 并发编程实战.pdf

java并发编程艺术 pdf

java并发编程的艺术pdf

java并发编程的艺术

java并发编程实战 pdf

Java并发编程实战

java并发编程艺术在线看

儒猿课堂 java并发编程教程

java并发编程训练

最新资源