ast.indd 08/07/2014 Page xvii
FOREWORD
GPUs have come a long way. From their origins as specialized graphics processors that could rap-
idly produce images for output to a display unit, they have become a go-to technology when ultra-
fast processing is needed. In the past few years, GPUs have increasingly been attached to CPUs to
accelerate a broad array of computations in so-called heterogeneous computing. Today, GPUs are
con gured on many desktop systems, on compute clusters, and even on many of the largest super-
computers in the world. In their extended role as a provider of large amounts of compute power for
technical computing, GPUs have enabled advances in science and engineering in a broad variety of
disciplines. They have done so by making it possible for huge numbers of compute cores to work in
parallel while keeping the power budgets very reasonable.
Fortunately, the interfaces for programming GPUs have kept up with this rapid change. In the past,
a major effort was required to use them for anything outside the narrow range of applications they
were intended for, and the GPU programmer needed to be familiar with many concepts that made
good sense only to the graphics programmer. Today’s systems provide a much more convenient
means to create application software that will run on them. In short, we have CUDA.
CUDA is one of the most popular application programming interfaces for accelerating a range of
compute kernels on the GPU. It can enable code written in C or C++ to run ef ciently on a GPU
with very reasonable programming effort. It strikes a balance between the need to know about the
architecture in order to exploit it well, and the need to have a programming interface that is easy to
use and results in readable programs.
This book will be a valuable resource for anyone who wants to use GPUs for scienti c and technical
programming. It provides a comprehensive introduction to the CUDA programming interface and
its usage. For a start, it describes the basics of parallel computing on heterogeneous architectures
and introduces the features of CUDA. It then explains how CUDA programs are executed. CUDA
exposes the execution and memory model to the programmer; as a result, the CUDA programmer
has direct control of the massively parallel environment. In addition to giving details of the CUDA
memory model, the text provides a wealth of information on how it can be utilized. The follow-
ing chapter discusses streams, as well as how to execute concurrent and overlapping kernels. Next
comes information on tuning, on using CUDA libraries, and on using OpenACC directives to pro-
gram GPUs. After a chapter on multi-GPU programming, the book concludes by discussing some
implementation considerations. Moreover, a variety of examples are given to help the reader get
started, many of which can be downloaded and executed.
CUDA provides a nice balance between expressivity and programmability that has proven itself
in practice. However, those of us who have made it their mission to simplify application develop-
ment know that this is an on-going story. For the past few years, CUDA researchers have worked
to improve heterogeneous programming tools. CUDA 6 introduces many new features, including
uni ed memory and plug-in libraries, to make GPU programming even easier. They have also pro-
vided a set of directives called OpenACC, which is introduced in this book. OpenACC promises to
www.it-ebooks.info