6 CHAPTER 1. INTRODUCTION
expect the parallelism to do for you. As seen in Sec-
tion 1.2, the primary goals of parallel programming are
performance, productivity, and generality. Because this
book is intended for developers working on performance-
critical code near the bottom of the software stack, the re-
mainder of this section focuses primarily on performance
improvement.
It is important to keep in mind that parallelism is but
one way to improve performance. Other well-known
approaches include the following, in roughly increasing
order of difficulty:
1. Run multiple instances of a sequential application.
2. Make the application use existing parallel software.
3.
Apply performance optimization to the serial appli-
cation.
These approaches are covered in the sections.
1.3.1 Multiple Instances of a Sequential
Application
Running multiple instances of a sequential application can
allow you to do parallel programming without actually
doing parallel programming. There are a large number of
ways to approach this, depending on the structure of the
application.
If your program is analyzing a large number of different
scenarios, or is analyzing a large number of independent
data sets, one easy and effective approach is to create a
single sequential program that carries out a single analysis,
then use any of a number of scripting environments (for
example the
bash
shell) to run a number of instances of
this sequential program in parallel. In some cases, this
approach can be easily extended to a cluster of machines.
This approach may seem like cheating, and in fact
some denigrate such programs as “embarrassingly paral-
lel”. And in fact, this approach does have some potential
disadvantages, including increased memory consumption,
waste of CPU cycles recomputing common intermediate
results, and increased copying of data. However, it is of-
ten extremely productive, garnering extreme performance
gains with little or no added effort.
1.3.2 Use Existing Parallel Software
There is no longer any shortage of parallel software envi-
ronments that can present a single-threaded programming
environment, including relational databases [
Dat82
], web-
application servers, and map-reduce environments. For
example, a common design provides a separate program
for each user, each of which generates SQL that is run
concurrently against a common relational database. The
per-user programs are responsible only for the user inter-
face, with the relational database taking full responsibility
for the difficult issues surrounding parallelism and persis-
tence.
Taking this approach often sacrifices some perfor-
mance, at least when compared to carefully hand-coding
a fully parallel application. However, such sacrifice is
often justified given the huge reduction in development
effort required.
1.3.3 Performance Optimization
Up through the early 2000s, CPU performance was dou-
bling every 18 months. In such an environment, it is often
much more important to create new functionality than to
do careful performance optimization. Now that Moore’s
Law is “only” increasing transistor density instead of
increasing both transistor density and per-transistor per-
formance, it might be a good time to rethink the impor-
tance of performance optimization. After all, performance
optimization can reduce power consumption as well as
increasing performance.
From this viewpoint, parallel programming is but an-
other performance optimization, albeit one that is be-
coming much more attractive as parallel systems become
cheaper and more readily available. However, it is wise
to keep in mind that the speedup available from paral-
lelism is limited to roughly the number of CPUs, while
the speedup potentially available from straight software
optimization can be multiple orders of magnitude.
Furthermore, different programs might have different
performance bottlenecks. Parallel programming will only
help with some bottlenecks. For example, if your program
spends most of its time waiting on data from your disk
drive, using multiple CPUs is not likely to gain much per-
formance. In fact, if the program was reading from a large
file laid out sequentially on a rotating disk, parallelizing
your program might well make it a lot slower. You should
instead add more disk drives, optimize the data so that
the file can be smaller (thus faster to read), or, if possible,
avoid the need to read quite so much of the data.
Quick Quiz 1.11:
What other bottlenecks might pre-
vent additional CPUs from providing additional perfor-
mance?