没有合适的资源?快使用搜索试试~ 我知道了~
首页Multicore and GPU Programming An Integrated Approach.pdf
资源详情
资源评论
资源推荐

Multicore and GPU
Programming
An Integrated Approach

Multicore and GPU
Programming
An Integrated Approach
Gerassimos Barlas
AMSTERDAM • BOSTON • HEIDELBERG • LONDON
NEW YORK • OXFORD • PARIS • SAN DIEGO
SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
Mor
g
an Kaufmann is an imprint of Elsevier

Dedicated to my late parents for making it possible,
and my loving wife and children for making it worthwhile.

Preface
Parallel computing has been given a fresh breath of life since the emergence of
multicore architectures in the rst decade of the new century. The new platforms
demand a new approach to software development; one that blends the tools and
established practices of the network-of-workstations era with emerging software
platforms such as CUDA.
This book tries to address this need by covering the dominant contemporary tools
and techniques, both in isolation and also most importantly in combination with each
other. We strive to provide examples where multiple platforms and programming
paradigms (e.g., message passing & threads) are effectively combined. “Hybrid”
computation, as it is usually called, is a new trend in high-performance computing,
one that could possibly allow software to scale to the “millions of threads” required
for exascale performance.
All chapters are accompanied by extensive examples and practice problems with
an emphasis on putting them to work, while comparing alternative design scenarios.
All the little details, which can make the difference between a productive software
development and a stressed exercise in futility, are presented in a orderly fashion.
The book covers the latest advances in tools that have been inherited from the
1990s (e.g., the OpenMP and MPI standards), but also more cutting-edge platforms,
such as the Qt library with its sophisticated thread management and the Thrust
template library with its capability to deploy the same software over diverse multicore
architectures, including both CPUs and Graphical Processing Units (GPUs).
We could never accomplish the feat of covering all the tools available for
multicore development today. Even some of the industry-standard ones, like POSIX
threads, are omitted.
Our goal is to both sample the dominant paradigms (ranging from OpenMP’s
semi-automatic parallelization of sequential code to the explicit communication
“plumping” that underpins MPI), while at the same time explaining the rationale and
how-to, behind efcient multicore program development.
WHAT IS IN THIS BOOK
This book can be separated in the following logical units, although no such distinction
is made in the text:
• Introduction, designing multicore software: Chapter 1 introduces multicore
hardware and examines inuential instances of this architectural paradigm.
Chapter 1 also introduces speedup and efciency, which are essential metrics
used in the evaluation of multicore and parallel software. Amdahl’s law and
Gustafson-Barsis’s rebuttal cap-up the chapter, providing estimates of what can
xv

xvi Preface
be expected from the exciting new developments in multicore and many-core
hardware.
Chapter 2 is all about the methodology and the design patterns that can be
employed in the development of parallel and multicore software. Both work
decomposition patterns and program structure patterns are examined.
• Shared-memory programming: Two different approaches for shared-memory
parallel programming are examined: explicit and implicit parallelization. On the
explicit side, Chapter 3 covers threads and two of the most commonly used
synchronization mechanisms, semaphores and monitors. Frequently
encountered design patterns, such as producers-consumers and readers-writers,
are explained thoroughly and applied in a range of examples.
On the implicit side, Chapter 4 covers the OpenMP standard that has been
specically designed for parallelizing existing sequential code with minimum
effort. Development time is signicantly reduced as a result. There are still
complications, such as loop-carried dependencies, which are also addressed.
• Distributed memory programming: Chapter 5 introduces the de facto standard
for distributed memory parallel programming, i.e., the Message Passing
Interface (MPI). MPI is relevant to multicore programming as it is designed to
scale from a shared-memory multicore machine to a million-node
supercomputer. As such, MPI provides the foundation for utilizing multiple
disjoint multicore machines, as a single virtual platform.
The features that are covered include both point-to-point and collective
communication, as well as one-sided communication. A section is dedicated to
the Boost.MPI library, as it does simplify the proceedings of using MPI,
although it is not yet feature-complete.
• GPU programming: GPUs are one of the primary reasons why this book was put
together. In a similar fashion to shared-memory programming, we examine the
problem of developing GPU-specic software from two perspectives: on one
hand we have the “nuts-and-bolts” approach of Nvidia’s CUDA, where memory
transfers, data placement, and thread execution conguration have to be
carefully planned. CUDA is examined in Chapter 6.
On the other hand, we have the high-level, algorithmic approach of the Thrust
template library, which is covered in Chapter 7. The STL-like approach to
program design affords Thrust the ability to target both CPUs and GPU
platforms, a unique feature among the tools we cover.
• Load balancing : Chapter 8 is dedicated to an often under-estimated aspect of
multicore development. In general, load balancing has to be seriously
considered once heterogeneous computing resources come into play. For
example, a CPU and a GPU constitute such a set of resources, so we should not
think only of clusters of dissimilar machines as tting this requirement. Chapter
8 briey discusses the Linda coordination language, which can be considered a
high-level abstraction of dynamic load balancing.
The main focus is on static load balancing and the mathematical models that
can be used to drive load partitioning and data communication sequences.
剩余670页未读,继续阅读













安全验证
文档复制为VIP权益,开通VIP直接复制

评论0