没有合适的资源?快使用搜索试试~ 我知道了~
首页深入解析Java并发编程:实战指南
深入解析Java并发编程:实战指南
需积分: 1 1 下载量 72 浏览量
更新于2024-07-18
收藏 3.89MB PDF 举报
“深入理解Java并发编程的好文章,探讨并发编程的实际机制,适合高级开发者阅读。”
在Java并发编程中,理解并掌握多线程环境下的工作原理是至关重要的。并发编程可以充分利用多核处理器的能力,提高应用程序的性能。然而,由于其复杂性,也可能带来诸多问题,如线程安全、死锁、竞态条件等。本文可能详细阐述了Java并发的基础概念,包括线程的创建、管理以及同步机制。
首先,Java中通过`Thread`类或者实现`Runnable`接口来创建线程。线程的启动、暂停、恢复和停止都是通过相应的方法进行控制,而线程的执行顺序则依赖于操作系统的调度策略。
其次,为了保证数据一致性,Java提供了多种同步机制。包括`synchronized`关键字,它可以保证同一时间只有一个线程访问特定的代码块或方法,防止竞态条件的发生。此外,`java.util.concurrent`包提供了更高级的并发工具,如`Semaphore`用于限制同时访问的线程数量,`CyclicBarrier`和`CountDownLatch`用于线程间的协作。
再者,Java内存模型(JMM)定义了线程之间的共享变量如何交互和可见性。`volatile`关键字可以确保变量的修改对其他线程立即可见,但不能保证原子性。`Atomic`类提供了一种原子操作,可以在不使用`synchronized`的情况下保证线程安全。
另外,死锁是并发编程中的一个常见问题。当两个或更多线程相互等待对方释放资源时,就会发生死锁。避免死锁的关键在于正确设计线程资源获取的顺序,以及使用超时和死锁检测机制。
最后,文章可能还提到了Java并发工具类如`ExecutorService`和`ThreadPoolExecutor`,它们可以帮助我们更好地管理和控制线程池,有效地调度和复用线程,从而提高系统的并发性能和资源利用率。
深入理解Java并发编程不仅可以帮助开发者编写出高效的多线程程序,还能避免因并发问题导致的系统不稳定。这篇好文可能通过深入浅出的方式,让读者了解并发背后的原理和最佳实践,对于提升Java并发编程技能大有裨益。
4 CHAPTER 1. INTRODUCTION
0.1
1
10
100
1000
10000
100000
1975
1980
1985
1990
1995
2000
2005
2010
2015
MIPS per Die
Year
Figure 1.2: MIPS per Die for Intel CPUs
die over the past three decades, showing a consistent
four-order-of-magnitude increase . Note that the ad-
vent of multicore CPUs has permitted this increase
to continue unabated despite the clock-frequency
wall encountered in 2003.
One of the inescapable consequences of the rapid
decrease in the cost of hardware is that software
productivity grows in cr e asin gly important. It is no
longer sufficient merely to make efficient use of the
hardware, it is now also neces s ary to make extremely
efficient u s e of software developers. This has long
been the case for sequential hardware, but only re-
cently has parallel hardware be come a low-cost com-
modity. Therefore, the need for high productivity in
creating parallel software has only recently become
hugely important.
Quick Quiz 1.9: Given how cheap parallel hard-
ware has become, how can anyone afford to pay peo-
ple to program it?
Perhaps at one time, the sole purpose of parallel
software was performance. Now, however, produc-
tivity is increasingly important.
1.2.3 Generality
One way to justify the high cost of developing par-
allel software is to st r ive for maximal generality. All
else being equal, the cost of a more-general software
artifact can be spread over more users than can a
less-general artifact.
Unfortunately, generality often comes at the cost
of pe r for mance , productivity, or both. To see this,
consider the following popular parallel programming
environments:
C/C++ “Locking Plus Threads” : This
category, which includes POSIX Threads
(pthreads) [Ope97], Windows Threads, and nu-
merous operating-system kernel environments,
offers excellent perfor mance (at least within
the confines of a single SMP system) and also
offers good generality. Pity about the relatively
low productivity.
Java : This progr amming environment, which is in-
herently multithreaded, is widely believed to be
much more productive than C or C++, cour-
tesy of the automatic garbage collector and the
rich set of class librar ies , and is reasonably gen-
eral purpose. However, its performance, though
greatly improved over the past ten years, is gen-
erally considered to be less than that of C and
C++.
MPI : this message-passing interface [MPI08] pow-
ers the largest scientific and technical comput-
ing clusters in the world, so offers unparalleled
performance and scalability. It is in theory gen-
eral pu r pose, but has generally been used for
scientific and te chnical computing. It produc-
tivity is believed by many to be even less than
that of C/C++ “locking plus threads” environ-
ments.
OpenMP : this set of compiler directives can be
used to parallelize loops. It is thus quite specific
to this task, and this specificity often limits its
performance, It is, however, much easier to use
than MPI or parallel C/C++.
SQL : structured query language [Int92] is ex-
tremely specific, applying only to re lation al
database queries. However, its performance
is quite good, doing quite well in Transaction
Processing Performance Council (TPC) bench-
marks [Tra01]. Product iv ity is excellent, in fact,
this parallel programming environment permits
people who know almost nothing about paral-
lel p r ogramming to make good use of a large
parallel machine.
The nirvana of parallel programming environ-
ments, one that offers world-class perfor mance, pro-
ductivity, and generality, simply does not yet exist.
Until such a nirvana appears, it will be necessary
to make engineering tradeoffs among performance,
productivity, and generality. One s u ch tradeoff is
1.3. ALTERNATIVES TO PARALLEL PROGRAMMING 5
Application
Middleware (e.g., DBMS)
System Libraries
Operating System Kernel
Firmware
Hardware
Productivity
Performance
Generality
Figure 1.3: Software Layers and Performance, Pro-
ductivity, and Generality
shown in Figure 1.3, which shows how productivity
becomes increasingly important at the upper lay-
ers of the system stack, while performance and gen-
erality b ec ome increasingly important at the lower
layers of the system stack. The huge development
costs incurred near the bottom of the stack must be
spread over equally huge numbers of users on the one
hand (hence the importance of generality), and per-
formance lost near the bottom of the stack cannot
easily be recovered further up the stack. Near the
top of t he stack, there might be very few users for
a given specific application, in which case produc-
tivity concerns are paramount. This explains the
tendency towards “bloatware” further up the stack:
extra hardware is often cheaper than would be the
extra developers. This book is intended primarily
for developers working near the bottom of the stack,
where performance and generality are paramount
concerns.
It is important to note that a tradeoff between
productivity and generality has existed for centuries
in many fields. For but one example, a nailgun is
far more productive than is a hammer, but in con-
trast to the nailgun, a hammer can be used for many
things besides driving nails. It should therefore be
absolutely no surprise to see similar tradeoffs ap-
pear in the field of parallel computing. This tradeoff
is shown schematically in Figure 1.4. Here, Users 1,
2, 3, and 4 have specific jobs that they need t he
computer to help them with. The most productive
possible language or environment for a given user is
one that simply does that user’s job, without requir-
ing any programming, configuration, or other setup.
Quick Quiz 1.10: This is a ridiculously un-
achievable ideal!!! Why not focus on something that
is achievable in practice?
User 2
User 3
User 4
User 1
General−Purpose
Environment
for User 1
Env Productive
Special−Purpose
Special−Purpose
Special−Purpose Environment
Productive for User 3
Special−Purpose
Environment
Productive for User 4
Productive for User 2
Environment
HW /
Abs
Figure 1.4: Tradeoff Between Productivity and G en -
erality
Unfortunately, a system that does the job required
by user 1 is unlikely to do user 2’s job. In other
words, the most productive languages and environ-
ments are domain-specific, and thus by definition
lacking generali ty.
Another option is to tailor a given programming
language or environment to the hardware system (for
example, low-level languages such as assembly, C,
C++, or Java) or to some abstraction (for example,
Haskell, Prolog, or Snobol), as is shown by th e cir c u-
lar region near th e center of Figure 1.4. These lan-
guages can be considered t o be general in the sense
that they are e qu ally ill-suited to the jobs required
by users 1, 2, 3, and 4. In other words, their general-
ity is purchased at the expense of decreased produc-
tivity when compared to domain-s pecific languages
and environments.
With the three often-conflicting parallel-
programming goals of performance, pr oductivity,
and gene r ality in mind, it is now time to look into
avoiding the se conflicts by cons id er i ng alternatives
to parallel programming.
1.3 Alternatives to Parallel
Programming
In order to properly consider alternatives to parallel
programming, you must first have thought through
what you expect the parallelism to do for you. As
seen in Section 1.2, the pr i mary goals of parallel pro-
gramming are performance, productivity, and gen er -
ality.
Although historically most parallel developers
might be most concerned with the first goal, one ad-
6 CHAPTER 1. INTRODUCTION
vantage of the other goals is that they relieve you of
the need to justify using parallelism. The remainder
of this section is concerned only performance im-
provement.
It is important to keep in mind that parallelism
is but one way to improve performance. Other well-
known approaches include the following, in roughly
increasing order of difficulty:
1. Run multiple instances of a sequential applica-
tion.
2. Construct the application to make use of exist-
ing parallel software.
3. Apply performance optimization to the serial
application.
1.3.1 Multiple Instances of a Sequen-
tial Application
Running multiple instanc es of a sequential app lic a-
tion can allow you to do parallel programming with-
out actually doing parallel programming. There are
a large number of ways to approach this, depending
on the structure of the application.
If your program is analyzing a large number of
different scenarios, or is analyzing a large number
of independent data sets, one easy and effective ap-
proach is to create a single sequential program that
carries out a single analysis, then use any of a num-
ber of scripting enviroments (for example the bash
shell) to run a number of instances of this sequential
program in parallel. In some cases, thi s approach
can be easily extended to a cluste r of machines.
This approach may seem like cheating, and in fact
some denigrate such programs “embarrassingly par-
allel”. And in fact, this approach does have some
potential disadvantages, including increased mem-
ory consu mpt ion, waste of CPU cycles recomputing
common intermediate results, and increased copying
of data. However, it is often extremely effective, gar-
nering extreme performance gains with lit tl e or no
added effort.
1.3.2 Make Use of Existing Parallel
Software
There is no longer any shortage of parallel soft-
ware environments that can present a single-
threaded programming environment, including rela-
tional databases, web-application servers, and map-
reduce environments. For example, a common de-
sign provides a separate program for each user, each
of which generates SQL that is run concurrently
against a common relational database. The per-user
programs are responsible only for the user interface,
with the relational database taking full responsbility
for the difficult issues surrounding parallelism and
persistence.
Taking this approach often sacrifices some perfor-
mance, at least when compared to carefully hand-
coding a fully parallel application. However, such
sacrifice is often justified given the great reduction
in development effort required.
1.3.3 Performance Optimization
Up th r ough the early 2000s, CPU performance was
doubling every 18 months. In such an environment,
it is often much more import ant to create new func-
tionality than to do careful performance optimiza-
tion. Now that Moore’s Law is “only” in cr e asin g
transistor density instead of increasing both transis-
tor density and per-trans is tor performance, it might
be a good time to rethink the importance of perfor-
mance optimization.
After all, performance optimization can reduce
power consumption as well as increasing perfor-
mance.
From this viewpoint, parallel programming is but
another performance optimization, albeit one that is
becoming much more attractive as parallel systems
become cheaper and more r eadi ly available. How-
ever, it is wise to keep in mind that the speedup
available from parallelism is limited to roughly the
number of CPUs, while the speed up potentially
available from straight software optimization can be
multiple orde r s of magnitude.
Furthermore, different programs might have dif-
ferent perfor manc e bottlenecks. Parallel program-
ming will only help with some b ottle ne cks. For ex-
ample, suppose that your program sp en ds most of
its time waiting on data from your disk drive. In
this case, making your program use multiple CPUs
is not likely to gain much performance. In fact, if
the program was reading from a large file laid out se-
quentially on a rotating disk, parallelizing your pro-
gram might well make it a lot slower. You should
instead add more disk drives, optimize the data so
that the file can be smaller (thus faster to read), or,
if possible, avoid the need to re ad quite so much of
the data.
Quick Quiz 1.11: What other bottlenecks might
prevent additional CPUs from pr oviding additional
performance?
Parallelism can be a powerful optimization tech-
nique, but it is not the only such technique, nor is it
appropriate for all situations. Of course, the easier
1.4. WHAT MAKES PARALLEL PROGRAMMING HARD? 7
Partitioning
Work
Access Control
Parallel
With Hardware
Interacting
Performance Productivity
Generality
Resource
Partitioning and
Replication
Figure 1.5: Categories of Tasks Required of Parallel
Programmers
it is to parallelize your pr ogram, the more attractive
parallelization becomes as an optimi zati on. Paral-
lelization has a reputation of being quite difficult,
which leads to the question “exactly what makes
parallel programming so difficult?”
1.4 What Makes Parallel Pro-
gramming Hard?
It is important to note that the difficulty of paral-
lel programming is as much a human-factors issue
as it is a set of technical properties of the parallel
programming p r oble m. This is the case because we
need human beings to be able to tell parallel sys-
tems what to do, and this two-way communication
between human and computer is as much a function
of the human as it is of the computer. Therefore,
appeals to abstractions or to mathematical analyses
will necessarily be of severely limited utility.
In the Industrial Revolution, the interface between
human and machine was evaluate d by human-factor
studies, then called time-and-motion studies. Al-
though there have been a few human-factor stud-
ies examining parall el programming [ENS05, ES05,
HCS
+
05, SS94], these studies have been extremely
narrowly focused, and hence u nabl e to d emons t r ate
any general results. Furthermore, given that the nor-
mal range of programmer pro du ct iv ity spans more
than an order of magnitude, it is unrealistic to ex-
pect an affordable study to be capable of detect-
ing (say) a 10% difference in productivity. Al-
though the multiple-order-of-magnitude differences
that s uch studies can reliably detect are extremely
valuable, the most impressive improvements tend to
be based on a long ser i es of 10% impr ovements.
We must therefore take a differe nt approach.
One such approach is to carefully consider what
tasks that parallel programmers must undertake
that are not requir ed of se qu e ntial programmers. We
can then evaluate how well a given programming
language or environment assists the developer with
these tasks. These tasks fall into the four categories
shown in Figure 1.5, each of which is covered in the
following secti ons .
1.4.1 Work Partitioning
Work par ti ti onin g is absolutely required for parallel
execution: if there is but one “glob” of work, then
it can be executed by at most one CPU at a time,
which is by definition sequential execution. How-
ever, partitioning the code requires great care. For
example, uneven partitioning can result in sequen-
tial execution once t he small partitions have com-
pleted [Amd67]. In less extreme cases, load balanc-
ing can be used to fu lly utilize available hardware,
thus attaining more-optimal performance.
In addition, partitioning of work can complicate
handling of global errors and events: a parallel pro-
gram may need to carry out non-tri vi al synchroniza-
tion in order to safely process such global events.
Each partition requires some sort of communica-
tion: after all, if a given thread did not communicate
at all, it would have no effect and would thus not
need to be execute d. However, because communi-
cation incurs overhead, careless partitioning choices
can result in severe performance degradation.
Furthermore, the number of concurrent threads
must often be controlled, as each such thread occ u-
pies common resources, for example, space in CPU
caches. If too many threads are permitted to execute
concurrently, the CPU caches will overflow, result-
ing in high cache miss rate, which in turn degrades
performance. On the other hand, large numbers of
threads are often required to overlap computation
and I/O.
Quick Quiz 1.12: What besides CPU cache ca-
pacity might require limiting the number of concur-
rent threads?
Finally, permitting threads to execute concur-
rently greatly increases the program’s state s pace ,
which can make the program difficult to understand,
degrading productivity. All else being equal, smaller
state spaces having more regular structure are more
easily understood, but th is is a human-factors state-
ment as opposed to a technical or mathematical
statement. Good parallel designs might have ex-
tremely large state spaces, but nevertheless be easy
to under st and due to their regular structure, while
poor designs can be impenetrable despite h aving a
8 CHAPTER 1. INTRODUCTION
comparatively small state space. The best designs
exploit embarrassing parallelism, or transfor m the
problem to one having an embarrassingly parallel so-
lution. In either case, “embarrassingly parallel” is in
fact an embarrassment of riches. The current state
of the art enumerates good designs; more work is
required to make more general judgements on state-
space size and structure.
1.4.2 Parallel Access Control
Given a sequential program with only a single
thread, that single thread has full access to all of
the program’s resources. These resources are most
often in-memory data structures, but can be CPUs,
memory (includ in g caches), I/O devices, computa-
tional accelerators, files, and much else besides.
The first parallel-access-control issue is whether
the form of the access to a given re s our ce depends
on that resource’s location. For example, in many
message-passing environments, local-variable access
is via expressions and assignments, while remote-
variable access uses an entirly different syntax,
usually involving messaging. The POSIX threads
environment [Ope97], Structured Q ue r y Language
(SQL) [I nt92], and partitioned global address-space
(PGAS) environments such as Universal Parallel C
(UPC) [EGCD03] offer implicit access, while Mes-
sage Passing Interface (MPI) [M PI08] offers explicit
access because access to re mote data requires ex-
plicit messaging.
The other parallel access-control issue is how
threads coordinate access to the resource s . This
coordination is carri ed out by the very large num-
ber of synchronization mechanisms provided by var-
ious parallel languages and environments, includ-
ing message passing, locking, transactions, reference
counting, ex pl ici t timin g, shared atomic variables,
and data ownership. Many traditional parallel-
programming concerns such as deadlock, livelock,
and transaction rollback stem from th is coordina-
tion. This framework can be elaborated to in-
clude comparisions of these s yn chronization mech-
anisms, for ex ample locking vs. transactional mem-
ory [MMW07], but such elaboration is b e yond the
scope of this section.
1.4.3 Resource Partitioning and
Replication
The most effective parallel algorithms and sys t ems
exploit resource parallelism, so much so that it is
usually wise to begin parallelization by partition-
ing your write-intensive resources and replicating
frequently accessed read-mostly resources. The re-
source in question is most frequently data, which
might be partitioned over computer systems, mass-
storage devices, NUMA n odes, CPU cores (or dies
or hardware threads), pages, cache lines, instances
of synchronization primitives, or critical sections of
code. For example, partitioning over locking primi-
tives is termed “data locking” [BK85].
Resource partitioning is frequently application de-
pendent, for example, numerical applications fre-
quently partition matrices by row, column, or sub-
matrix, while commercial applications frequently
partition write-intensive data structur es and repli-
cate read-mostly data structures. For example, a
commercial application might assign the data for a
given customer to a given few computer systems out
of a large cluster. An application might statically
partition data, or dynamically change the partition-
ing over time.
Resource partitioning is extremely effective, but
it can be quite challenging for complex multilinked
data structures.
1.4.4 Interacting With Hardware
Hardware interaction is n or mally the domain of the
operating system, the compiler, libraries, or other
software-environment infrastructure. However, de-
velopers working with novel hardware features and
components will often need to work directly with
such hardware. In addition, direct access to the
hardware can b e required when squeezing the last
drop of performance out of a given syste m. In this
case, the developer may need to tailor or configure
the application to the cache geometry, system topol-
ogy, or interconnect protocol of the target hardware.
In some cases, hardware may be considered to be
a resource which may be subject to partitioning or
access control, as descr ibed in the previous sections.
1.4.5 Composite Capabilities
Although these four capabilities are fundamental,
good engineering practice uses composites of these
capabilities. For ex ample , the data-parallel ap-
proach first partitions the data so as to minimize
the need for inter-partition communication, par ti-
tions the code accordingly, and finally maps data
partitions and threads so as to maximize through-
put while minimizing inter-thread communication.
The developer can then consider each partition sepa-
rately, greatly reducing the size of the relevant state
space, in turn i nc r eas ing productivity. Of course,
some problems are non-partitionable but on the
剩余357页未读,继续阅读
2021-09-10 上传
2019-05-28 上传
2023-11-25 上传
2023-11-10 上传
2023-07-01 上传
2023-07-08 上传
2023-07-27 上传
2023-08-02 上传
2024-01-23 上传
wenbo31_newlife
- 粉丝: 0
- 资源: 7
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- JDK 17 Linux版本压缩包解压与安装指南
- C++/Qt飞行模拟器教员控制台系统源码发布
- TensorFlow深度学习实践:CNN在MNIST数据集上的应用
- 鸿蒙驱动HCIA资料整理-培训教材与开发者指南
- 凯撒Java版SaaS OA协同办公软件v2.0特性解析
- AutoCAD二次开发中文指南下载 - C#编程深入解析
- C语言冒泡排序算法实现详解
- Pointofix截屏:轻松实现高效截图体验
- Matlab实现SVM数据分类与预测教程
- 基于JSP+SQL的网站流量统计管理系统设计与实现
- C语言实现删除字符中重复项的方法与技巧
- e-sqlcipher.dll动态链接库的作用与应用
- 浙江工业大学自考网站开发与继续教育官网模板设计
- STM32 103C8T6 OLED 显示程序实现指南
- 高效压缩技术:删除重复字符压缩包
- JSP+SQL智能交通管理系统:违章处理与交通效率提升
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功