xviii Preface
Chapter 2 is all about the methodology and the design patterns that can be em-
ployed in the development of parallel and multicore software. Both work decom-
position patterns and program structure patterns are examined.
• Programming with threads and processes: Dealing explicitly with the individ-
ual paths of execution in the form of threads or processes is the most elementary
form of parallel programming. In this part we examine how this paradigm is used
to program CPUs (with C++11 threads), GPUs (with CUDA and OpenCL), and
even clusters of networked machines (using MPI).
C++11 threads have been a long-awaited addition to the C++ standard, estab-
lishing a firm foundation for cross-platform, high-performance, parallel software
development for CPUs. Chapter 3 covers C++11 facilities, along with commonly
used synchronization mechanisms such as semaphores and monitors. Also, fre-
quently encountered design patterns, such as producers–consumers and readers–
writers, are explained thoroughly and applied in a range of examples.
Chapter 4 is dedicated to shared-memory parallel data structures and how we can
ensure correctness when multiple actions are attempted on a program’s data.
In Chapter 5 we cover MPI, which is the de facto standard for distributed memory
parallel programming. MPI provides the foundation for utilizing multiple disjoint
multicore machines as a single virtual platform, designed to scale from a single
shared-memory multicore machine to a million-node supercomputer. The features
that are covered include both point-to-point and collective communication, as well
as one-sided communication. A section is dedicated to the Boost.MPI library,
as it does simplify the proceedings of using MPI, although it is not yet feature-
complete.
GPU software development is covered in great detail, including kernel design,
memory management, grid-block/index space configurations, and optimization
techniques. Both CUDA (Chapter 6) and OpenCL (Chapter 7) are examined both
in isolation and in combination with other platforms such as C++11 threads and
MPI.
• High-level parallel programming: Parallel software suffers from high develop-
ment and maintenance costs. Some of this burden can be alleviated by utilizing
tools that handle the more esoteric details of “how” and “where” to execute costly
computations.
The OpenMP standard in its latest incarnation (v5.0) manages to address these
problems by requiring only “hints” from the programmer, while also allowing
both CPUs and GPUs to be targeted. There are still complications that need to be
addressed, such as loop-carried dependencies, which are also examined in Chap-
ter 8.
OpenMP’s design philosophy is to take advantage of multi- and many-core hard-
ware, while requiring minimum alterations to the source code of a sequential
program. The Qt library covered in Chapter 9 offers another solution to the de-
sign problem by supporting high-level abstractions in the form of map, filter, and
reduce operations that can be applied to collections of data, without the need to
instantiate or manage any threads.