66 Copublished by the IEEE CS and the AIP 1521-9615/10/$26.00 © 2010 IEEE Co m p u t i n g in SC i e n C e & en g i n e e r i n g
N OVEL A RCHITECTURES
Editors: Volodymyr Kindratenko, kindr@ncsa.uiuc.edu
Pedro Trancoso, pedro@cs.ucy.ac.cy
Op e n CL: A pA r A L L e L pr O g r A m m i n g
St A n d A r d f O r He t e r O g e n e O u S
CO m p u t i n g Sy S t e m S
By John E. Stone, David Gohara, and Guochun Shi
T
he strong need for increased
computational performance in
science and engineering has led
to the use of heterogeneous comput-
ing, with GPUs and other accelerators
acting as coprocessors for arithmetic
intensive data-parallel workloads.
1–4
OpenCL is a new industry standard
for task-parallel and data-parallel het-
erogeneous computing on a variety
of modern CPUs, GPUs, DSPs, and
other microprocessor designs.
5
This
trend toward heterogeneous comput-
ing and highly parallel architectures
has created a strong need for software
development infrastructure in the
form of parallel programming lan-
guages and subroutine libraries that
can support heterogeneous comput-
ing on multiple vendors’ hardware
platforms. To address this, developers
adapted many existing science and en-
gineering applications to take advan-
tage of multicore CPUs and massively
parallel GPUs using toolkits such as
Threading Building Blocks (TBB),
OpenMP, Compute Unied Device
Architecture (CUDA),
6
and others.
7,8
Existing programming toolkits, how-
ever, were either limited to a single
microprocessor family or didn’t sup-
port heterogeneous computing.
OpenCL provides easy-to-use
abstractions and a broad set of
programming APIs based on past
successes with CUDA and other
programming toolkits. OpenCL
denes core functionality that all
devices support, as well as optional
functionality for high-function de-
vices; it also includes an extension
mechanism that lets vendors expose
unique hardware features and ex-
perimental programming interfaces
for application developers’ benet.
Although OpenCL can’t mask sig-
nicant differences in hardware archi-
tecture, it does guarantee portability
and correctness. This makes it much
easier for developers to start with a
correctly functioning OpenCL pro-
gram tuned for one architecture
and produce a correctly function-
ing program optimized for another
architecture.
The OpenCL
Programming Model
In OpenCL, a program is executed on
a computational device, which can be a
CPU, GPU, or another accelerator
(see Figure 1). Devices contain one or
more compute units (processor cores).
These units are themselves composed
of one or more single-instruction
multiple-data (SIMD) processing ele-
ments (PE) that execute instructions
in lock-step.
OpenCL Device Management
By providing a common language
and common programming interfaces
and hardware abstractions, OpenCL
lets developers accelerate applications
with task- or data-parallel computa-
tions in a heterogeneous computing
environment consisting of the host
CPU and any attached OpenCL de-
vices. Such devices might or might
not share memory with the host CPU,
and typically have a different ma-
chine instruction set. The OpenCL
programming interfaces therefore as-
sume heterogeneity between the host
and all attached devices.
OpenCL’s key programming inter-
faces include functions for
enumerating available target de-•
vices (CPUs, GPUs, and various
accelerators);
managing the target devices’ •
contexts;
managing memory allocations; •
performing host-device memory •
transfers;
compiling the OpenCL programs •
and kernel functions that the
devices will execute;
launching kernels on the target •
devices;
querying execution progress; and•
checking for errors. •
Although developers can compile
and link OpenCL programs into
inary objects using off line com-
pilation methodology, OpenCL
The OpenCL standard offers a common API for program execution on systems composed of different types
of computational devices such as multicore CPUs, GPUs, or other accelerators.