it’s no surprise that push-button parallel programming is
proving even more elusive.
In recent years, Microprocessor Report has been analyz-
ing various approaches to parallel processing. Among other
technologies, we’ve examined RapidMind’s Multicore
Development Platform (see MPR 11/26/07-01,“Parallel
Processing for the x86”), PeakStream’s math libraries for
graphics processors (see MPR 10/2/06-01, “Number
Crunching With GPUs”), Fujitsu’s remote procedure calls
(see MPR 8/13/07-01, “Fujitsu Calls Asynchronously”),
Ambric’s development-driven CPU architecture (see MPR
10/10/06-01
, “Ambric’s New Parallel Processor”), and
Tilera’s tiled mesh network (see
MPR 11/5/07-01, “Tilera’s
Cores Communicate Better”).
Now it is Nvidia’s turn for examination. Nvidia’s
Compute Unified Device Architecture (CUDA) is a soft-
ware platform for massively parallel high-performance
computing on the company’s powerful GPUs. Formally
introduced in 2006, after a year-long gestation in beta,
CUDA is steadily winning customers in scientific and engi-
neering fields. At the same time, Nvidia is redesigning and
repositioning its GPUs as versatile devices suitable for much
more than electronic games and 3D graphics. Nvidia’s Tesla
brand denotes products intended for high-performance
computing; the Quadro brand is for professional graphics
workstations, and the GeForce brand is for Nvidia’s tradi-
tional consumer graphics market.
For Nvidia, high-performance computing is both an
opportunity to sell more chips and insurance against an
uncertain future for discrete GPUs. Although Nvidia’s
GPUs and graphics cards have long been prized by gamers,
the graphics market is changing. When AMD acquired ATI
in 2006, Nvidia was left standing as the largest independent
GPU vendor. Indeed, for all practical purposes, Nvidia is
the only independent GPU vendor, because other competi-
tors have fallen away over the years. Nvidia’s sole-survivor
status would be enviable—should the market for discrete
GPUs remain stable. However, both AMD and Intel plan to
integrate graphics cores in future PC processors. If these
integrated processors shrink the consumer market for dis-
crete GPUs, it could hurt Nvidia. On the other hand, many
PCs (especially those sold to businesses) already integrate a
graphics processor at the system level, so integrating those
graphics into the CPU won’t come at Nvidia’s expense. And
serious gamers will crave the higher performance of discrete
graphics for some time to come. Nevertheless, Nvidia is wise
to diversify.
Hence, CUDA. A few years ago, pioneering program-
mers discovered that GPUs could be reharnessed for tasks
other than graphics. However, their improvised program-
ming model was clumsy, and the programmable pixel
shaders on the chips weren’t the ideal engines for general-
purpose computing. Nvidia has seized upon this opportunity
to create a better programming model and to improve the
shaders. In fact, for the high-performance computing mar-
ket, Nvidia now prefers to call the shaders “stream proces-
sors” or “thread processors.” It’s not just marketing hype.
Each thread processor in an Nvidia GeForce 8-series GPU
PARALLEL PROCESSING WITH CUDA
Nvidia’s High-Performance Computing Platform Uses Massive Multithreading
By Tom R. Halfhill {01/28/08-01}
Parallel processing on multicore processors is the industry’s biggest software challenge, but
the real problem is there are too many solutions—and all require more effort than setting
a compiler flag. The dream of push-button serial programming was never fully realized, so
REPORT
MICROPROCESSOR
THE INSIDER’S GUIDE TO MICROPROCESSOR HARDWARE
www.MPRonline.com
Article Reprint