GPU计算时代：从图形处理器到并行计算核心

需积分: 10 85 浏览量更新于2024-09-16 收藏 695KB PDF 举报

"The GPU Computing Era" GPU计算时代已经来临，这一趋势由NVIDIA公司的John Nikolls和William J Dally所著的书籍详尽阐述。GPU，即图形处理器，最初是专为处理复杂的图形和视觉效果而设计的，但现在已演变成一种强大的并行计算平台，被广泛应用于需求高计算性能的消费者应用和高性能计算领域。随着技术的快速发展，GPU架构经历了从图形处理器到多核心并行处理器的转变，这种转变极大地提升了其在计算任务中的效率和性能。文章指出，GPU计算正处在转折点，越来越多具有大量并行性的应用开始利用GPU的并行计算能力来实现超乎寻常的执行速度。这使得以前因为执行时间过长而被认为不可行的应用现在变得可能。随着GPU从可配置的图形处理器转变为可编程的并行处理器，它们在个人电脑、笔记本、台式机和工作站中的普及，使得每个设备都拥有了一个拥有数百个并行处理核心的多线程多处理器，不仅擅长图形处理，也擅长计算任务。现代GPU拥有数百个并行处理核心，能够执行成千上万的线程，这使得它们在处理大数据集、物理模拟、人工智能（AI）、机器学习、深度学习、高性能计算以及实时渲染等领域表现出色。同时，CPU与GPU的协同处理（CPU-GPU Coprocessing）模式得到了广泛采纳，进一步加速了并行应用程序的执行速度，提高了整体系统性能。 GPU计算的崛起也推动了软件开发的进步，例如CUDA（Compute Unified Device Architecture）编程环境的出现，使得开发者可以轻松地利用GPU的强大计算能力。CUDA提供了C、C++和Fortran等语言的接口，让程序员能够编写高效能的并行代码，直接运行在GPU上。此外，GPU的高性能计算能力对于科学研究和工程计算也有着重大意义，如气候建模、生物信息学分析、粒子物理学研究等，这些领域需要处理的数据量巨大，且高度并行化，GPU的并行计算能力能够显著减少计算时间，提高研究效率。 GPU计算时代的到来，不仅改变了我们对计算机性能的认识，也正在革新各个行业的计算方式，从娱乐到科研，从数据中心到边缘计算，GPU的影响无处不在，它已经成为推动科技进步的关键力量之一。随着技术的不断进步，我们可以期待未来GPU将在更多领域发挥更大的作用，继续引领计算技术的发展。

programmable 32-bit floating-point pixel-

fragment processors and vertex processors,

programmed with Cg programs, DX9, and

OpenGL. These processors were highly multi-

threaded, creating a thread and executing a

thread program for each vertex and pixel

fragment. The GeForce 6800 scalable pro-

cessor core architecture facilitated multiple

GPU implementations with different num-

bers of processor cores.

Developing the Cg language

for pro-

gramming GPUs provided a scalable parallel

programming model for the programmable

floating-point vertex and pixel-fragment pro-

cessors of GeForce FX, GeForce 6800, and

subsequent GPUs. A Cg program resembles

a C program for a single thread that draws

a single vertex or single pixel. The multi-

threaded GPU created independent threads

that executed a shader p rogram to draw

every vertex and pixel fragment.

In addition to rendering real-time graph-

ics, programmers also used Cg to compute

physical simulations and other general-

purpose GPU (GPGPU) computations.

Early GPGPU computing programs achieved

high performance, but were difficult to write

because programmers had to express non-

graphics computations with a graphics API

such as OpenGL.

Unified computing and graphics GPUs

The GeForce 8800 introduced in 2006

featured the first unified graphics and com-

puting GPU architecture

7,8

programmable

in C with the CUDA parallel computing

model, in addition to using DX10 and

OpenGL. Its unified streaming processor

cores executed vertex, geometry, and pixel

shader threads for DX10 graphics programs,

and also executed computing threads for

CUDA C programs. Hardware multithread-

ing enabled the GeForce 8800 to efficiently

execute up to 12,288 threads concurrently

in 128 processor cores. NVIDIA deployed

thescalablearchitectureinafamilyof

GeForce GPUs with different numbers of

processor cores for each market segment.

The GeForce 8800 was the first GPU to

use scalar thread processors rather than vector

processors, matching standard scalar languages

like C, and eliminating the need to manage

vector registers and program vector

operations. It added instructions to support

C and other general-purpose languages,

including integer arithmetic, IEEE 754

floating-point arithmetic, and load/store

memory access instructions with byte address-

ing. It provided hardware and instructions to

support paral lel computa tion, communica-

tion, and synchronization—including thread

arrays, shared memory, and fast barrier

synchronization.

GPU computing systems

At first, users built personal supercom-

puters by adding multiple GPU cards to

PCs and workstations, and assembled clusters

of GPU computing nodes. In 2007, respond-

ing to demand for GPU computing systems,

NVIDIA introduced the Tesla C870, D870,

and S870 GPU card, deskside, and rack-

mount GPU computing systems containing

one, two, and four T8 GPUs. Th e T8

GPU was based on the GeForce 8800

GPU, configured for parallel computing.

The second-generation Tesla C1060 and

S1070 GPU computing systems introduced

in 2008 used the T10 GPU, based o n the

GPU in GeForce GTX 280. The T10 fea-

tured 240 processor cores, 1-teraflop-per-

second peak single-precision floating-point

rate, IEEE 754-2008 doubl e-pr ecision 64 -

bit floating-point arithmetic, and 4-Gbyte

DRAM memory. Today there are Tesla

S1070 systems with thousands of GPUs

widely deployed in high-performance com-

puting systems in production and research.

NVIDIA introduced the third-generation

Fermi GPU computing architecture in

2009.

Based on user experience with prior

generations, it addressed several key areas to

make GPU computing more broadly appli-

cable. Fermi implemented IEEE 754-2008

and significantly increased double-precision

performance. It added error-correcting code

(ECC) memory protection for large-scale

GPU computing, 64-bit unified addressing,

cached memory hierarchy, and instructions

for C, Cþþ,Fortran,OpenCL,and

DirectCompute.

GPU computing ecosystem

The GPU computing ecosystem is expand-

ing rapidly, enabled by the deployment of

more than 180 million CUDA-capable

[3B2-14] mmi2010020005.3d 23/3/010 15:43 Page 58

....................................................................

58 IEEE MICRO

.......... ................ .................. ................ ................ ................ .................. ................ .................................................................

HOT CHIPS

剩余13页未读，继续阅读

sylfree9999

粉丝: 0
资源: 1

GPU计算时代：从图形处理器到并行计算核心

GPU computing

NVIDIA GPU Computing SDK

GPU COMPUTING

NVIDIA GPU Computing

GPU Computing in Python

LabVIEW GPU Computing labview实现GPU计算

GPU Computing Gems Jade Edition

GPU Computing Gems Emerald Edition

Data Transfer Matters for GPU Computing

GPU Computing with MATLAB 使用 MATLAB 进行 GPU 计算.pdf

最新资源