2.2. PARALLEL PROGRAMMING GOALS 9
0.1
1
10
100
1000
10000
1975
1980
1985
1990
1995
2000
2005
2010
2015
CPU Clock Frequency / MIPS
Year
Figure 2.1: MIPS/Clock-Frequency Trend for Intel CPUs
code, and be happy? It will very likely be easier and you
will probably get done much more quickly.
Quick Quiz 2.7:
Are there no cases where parallel
programming is about something other than performance?
Note that “performance” is interpreted quite broadly
here, including scalability (performance per CPU) and
efficiency (for example, performance per watt).
That said, the focus of performance has shifted from
hardware to parallel software. This change in focus is due
to the fact that, although Moore’s Law continues to deliver
increases in transistor density, it has ceased to provide the
traditional single-threaded performance increases. This
can be seen in Figure 2.1
2
, which shows that writing
single-threaded code and simply waiting a year or two for
the CPUs to catch up may no longer be an option. Given
the recent trends on the part of all major manufacturers
towards multicore/multithreaded systems, parallelism is
the way to go for those wanting the avail themselves of
the full performance of their systems.
Even so, the first goal is performance rather than scal-
2
This plot shows clock frequencies for newer CPUs theoretically ca-
pable of retiring one or more instructions per clock, and MIPS (millions
of instructions per second, usually from the old Dhrystone benchmark)
for older CPUs requiring multiple clocks to execute even the simplest in-
struction. The reason for shifting between these two measures is that the
newer CPUs’ ability to retire multiple instructions per clock is typically
limited by memory-system performance. Furthermore, the benchmarks
commonly used on the older CPUs are obsolete, and it is difficult to
run the newer benchmarks on systems containing the old CPUs, in part
because it is hard to find working instances of the old CPUs.
ability, especially given that the easiest way to attain
linear scalability is to reduce the performance of each
CPU [
Tor01
]. Given a four-CPU system, which would
you prefer? A program that provides 100 transactions per
second on a single CPU, but does not scale at all? Or a
program that provides 10 transactions per second on a
single CPU, but scales perfectly? The first program seems
like a better bet, though the answer might change if you
happened to have a 32-CPU system.
That said, just because you have multiple CPUs is not
necessarily in and of itself a reason to use them all, espe-
cially given the recent decreases in price of multi-CPU
systems. The key point to understand is that parallel pro-
gramming is primarily a performance optimization, and,
as such, it is one potential optimization of many. If your
program is fast enough as currently written, there is no rea-
son to optimize, either by parallelizing it or by applying
any of a number of potential sequential optimizations.
3
By the same token, if you are looking to apply parallelism
as an optimization to a sequential program, then you will
need to compare parallel algorithms to the best sequential
algorithms. This may require some care, as far too many
publications ignore the sequential case when analyzing
the performance of parallel algorithms.
2.2.2 Productivity
Quick Quiz 2.8:
Why all this prattling on about non-
technical issues??? And not just any non-technical issue,
but productivity of all things? Who cares?
Productivity has been becoming increasingly important
in recent decades. To see this, consider that the price of
early computers was tens of millions of dollars at a time
when engineering salaries were but a few thousand dollars
a year. If dedicating a team of ten engineers to such a
machine would improve its performance, even by only
10%, then their salaries would be repaid many times over.
One such machine was the CSIRAC, the oldest still-
intact stored-program computer, which was put into op-
eration in 1949 [
Mus04
,
Dep06
]. Because this machine
was built before the transistor era, it was constructed of
2,000 vacuum tubes, ran with a clock frequency of 1kHz,
consumed 30kW of power, and weighed more than three
metric tons. Given that this machine had but 768 words
of RAM, it is safe to say that it did not suffer from the
productivity issues that often plague today’s large-scale
3
Of course, if you are a hobbyist whose primary interest is writing
parallel software, that is more than enough reason to parallelize whatever
software you are interested in.