4 CHAPTER 1. INTRODUCTION
must follow a specific receipe and pay attention to the dependencies between various tasks.
For example, vegetables cannot be sauteed before they are washed, and the eggs cannot be
fisked before they are broken!
Coding Parallel Algorithms. Another important challenge concerns the implementation
and use of parallel algorithms in the real world. The many forms of parallelism, ranging
from small to large scale, and from general to special purpose, have led to many different
programming languages and systems for coding parallel algorithms. These different pro-
gramming languages and systems often target a particular kind of hardware, and even a
particular kind of problem domain. As it turns out, one can easily spend weeks or even
months optimizing a parallel sorting algorithm on specific parallel hardware, such as a
multicore chip, a GPU, or a large-scale massively parallel distributed system.
Maximizing speedup by coding and optimizing an algorithm is not the goal of this book.
Instead, our goal is to cover general design principles for parallel algorithms that can be
applied in essentially all parallel systems, from the data center to the multicore chips on
mobile phones. We will learn to think about parallelism at a high-level, learning general
techniques for designing parallel algorithms and data structures, and learning how to ap-
proximately analyze their costs. The focus is on understanding when things can run in
parallel, and when not due to dependencies. There is much more to learn about paral-
lelism, and we hope you continue studying this subject.
Example 1.5. There are separate systems for coding parallel numerical algorithms on shared
memory hardware, for coding graphics algorithms on Graphical Processing Units (GPUs),
and for coding data-analytics software on a distributed system. Each such system tends to
have its own programming interface, its own cost model, and its own optimizations, mak-
ing it practically impossible to take a parallel algorithm and code it once and for all for all
possible applications. Indeed, it can require a significant effort to implement even a simple
algorithm and optimize it to run well on a particular parallel system.
2 Work, Span, Parallel Time
This section describes the two measures—work and span—that we use to analyze algo-
rithms. Together these measures capture both the sequential time and the parallelism avail-
able in an algorithm. We typically analyze both of these asymptotically, using for example
the big-O notation.
2.1 Work and Span
Work. The work of an algorithm corresponds to the total number of primitive operations
performed by an algorithm. If running on a sequential machine, it corresponds to the
sequential time. On a parallel machine, however, work can be divided among multiple
processors and thus does not necessarily correspond to time.