CUDA C编程专业指南：解锁高性能计算

需积分: 5 148 浏览量更新于2024-07-19 收藏 51.23MB PDF 举报

"《Professional CUDA C Programming》是一本专为深入理解和实践CUDA编程而设计的专业教材。CUDA是NVIDIA公司推出的一种并行计算平台和编程模型，它利用GPU（图形处理器）的强大并行处理能力，使得原本在CPU上执行的计算任务得以加速。本书适合那些对高性能计算(HPC)和GPU编程感兴趣的读者，特别是对CUDA C语言有需求的工程师和技术人员。书中内容涵盖了CUDA编程的基础至高级主题，包括但不限于CUDA架构原理、GPU内存管理、线程组织与调度、流式多任务执行、数据并行处理以及优化技术等。作者John Cheng、Max Grossman和Ty McKercher以其丰富的经验和专业知识，为读者提供了详实的代码示例和实战指导，帮助读者掌握如何有效地编写和调试CUDA程序，以提升应用的性能。随着HPC领域的不断发展，CUDA作为一种关键技术在科学计算、图形渲染、机器学习等领域发挥着重要作用。通过本书，读者将学会如何利用CUDA C进行大规模并行计算，理解如何利用GPU的并行特性解决复杂问题，并能适应不断变化的HPC技术趋势。值得注意的是，版权信息强调了未经许可不得复制或传播此出版物的内容，除非符合美国1976年版权法的107节或108节的规定。对于想进一步了解或使用书中的材料，必须获得出版商John Wiley & Sons的明确授权或许可。《Professional CUDA C Programming》是一本实用的参考书籍，不仅提供理论知识，还包含实践案例，是想要进入GPU并行计算领域或者提升现有技能的开发者的宝贵资源。"

xiv

CONTENTS

ftoc.indd 08/07/2014 Page xiv

CUDA Library Features Introduced in CUDA 6 358

Drop-In CUDA Libraries 358

Multi-GPU Libraries 359

A Survey of CUDA Library Performance 361

cuSPARSE versus MKL 361

cuBLAS versus MKL BLAS 362

cuFFT versus FFTW versus MKL 363

CUDA Library Performance Summary 364

Using OpenACC 365

Using OpenACC Compute Directives 367

Using OpenACC Data Directives 375

The OpenACC Runtime API 380

Combining OpenACC and the CUDA Libraries 382

Summary of OpenACC 384

Summary 384

CHAPTER 9: MULTI-GPU PROGRAMMING 387

Moving to Multiple GPUs 388

Executing on Multiple GPUs 389

Peer-to-Peer Communication 391

Synchronizing across Multi-GPUs 392

Subdividing Computation across Multiple GPUs 393

Allocating Memory on Multiple Devices 393

Distributing Work from a Single Host Thread 394

Compiling and Executing 395

Peer-to-Peer Communication on Multiple GPUs 396

Enabling Peer-to-Peer Access 396

Peer-to-Peer Memory Copy 396

Peer-to-Peer Memory Access with Uniﬁ ed Virtual Addressing 398

Finite Difference on Multi-GPU 400

Stencil Calculation for 2D Wave Equation 400

Typical Patterns for Multi-GPU Programs 401

2D Stencil Computation with Multiple GPUs 403

Overlapping Computation and Communication 405

Compiling and Executing 406

Scaling Applications across GPU Clusters 409

CPU-to-CPU Data Transfer 410

GPU-to-GPU Data Transfer Using Traditional MPI 413

www.it-ebooks.info

 ast.indd 08/07/2014 Page xvii

FOREWORD

GPUs have come a long way. From their origins as specialized graphics processors that could rap-

idly produce images for output to a display unit, they have become a go-to technology when ultra-

fast processing is needed. In the past few years, GPUs have increasingly been attached to CPUs to

accelerate a broad array of computations in so-called heterogeneous computing. Today, GPUs are

con gured on many desktop systems, on compute clusters, and even on many of the largest super-

computers in the world. In their extended role as a provider of large amounts of compute power for

technical computing, GPUs have enabled advances in science and engineering in a broad variety of

disciplines. They have done so by making it possible for huge numbers of compute cores to work in

parallel while keeping the power budgets very reasonable.

Fortunately, the interfaces for programming GPUs have kept up with this rapid change. In the past,

a major effort was required to use them for anything outside the narrow range of applications they

were intended for, and the GPU programmer needed to be familiar with many concepts that made

good sense only to the graphics programmer. Today’s systems provide a much more convenient

means to create application software that will run on them. In short, we have CUDA.

CUDA is one of the most popular application programming interfaces for accelerating a range of

compute kernels on the GPU. It can enable code written in C or C++ to run ef ciently on a GPU

with very reasonable programming effort. It strikes a balance between the need to know about the

architecture in order to exploit it well, and the need to have a programming interface that is easy to

use and results in readable programs.

This book will be a valuable resource for anyone who wants to use GPUs for scienti c and technical

programming. It provides a comprehensive introduction to the CUDA programming interface and

its usage. For a start, it describes the basics of parallel computing on heterogeneous architectures

and introduces the features of CUDA. It then explains how CUDA programs are executed. CUDA

exposes the execution and memory model to the programmer; as a result, the CUDA programmer

has direct control of the massively parallel environment. In addition to giving details of the CUDA

memory model, the text provides a wealth of information on how it can be utilized. The follow-

ing chapter discusses streams, as well as how to execute concurrent and overlapping kernels. Next

comes information on tuning, on using CUDA libraries, and on using OpenACC directives to pro-

gram GPUs. After a chapter on multi-GPU programming, the book concludes by discussing some

implementation considerations. Moreover, a variety of examples are given to help the reader get

started, many of which can be downloaded and executed.

CUDA provides a nice balance between expressivity and programmability that has proven itself

in practice. However, those of us who have made it their mission to simplify application develop-

ment know that this is an on-going story. For the past few years, CUDA researchers have worked

to improve heterogeneous programming tools. CUDA 6 introduces many new features, including

uni ed memory and plug-in libraries, to make GPU programming even easier. They have also pro-

vided a set of directives called OpenACC, which is introduced in this book. OpenACC promises to

www.it-ebooks.info

剩余526页未读，继续阅读

maxfist

粉丝: 0
资源: 2

CUDA C编程专业指南：解锁高性能计算

Professional CUDA C Programming-并行计算编程手册 (英文版)+带书签附带源码+可复制粘贴.rar

professional cuda c programming

professional-cuda-c-programming

All of Professional-CUDA-C-Programming

CUDA-Programming:我的CUDA编程书的示例代码

英文原版-Professional Android Sensor Programming 1st Edition

CUDA_Freshman

计算机视觉大型攻略 —— CUDA(2)执行模型

CUDA执行模型解析：优化计算机视觉程序性能

c++ cuda编程入门

最新资源