没有合适的资源?快使用搜索试试~ 我知道了~
首页NVIDIA_CUDA_ProgrammingGuide3.0
资源详情
资源评论
资源推荐

Version 3.0
2/20/2010
NVIDIA CUDA™
Programming Guide

ii CUDA Programming Guide Version 3.0

CUDA Programming Guide Version 3.0 iii
Table of Contents
Chapter 1. Introduction ..................................................................................... 1
1.1 From Graphics Processing to General-Purpose Parallel Computing ................... 1
1.2 CUDA™: a General-Purpose Parallel Computing Architecture ........................... 3
1.3 A Scalable Programming Model ..................................................................... 4
1.4 Document’s Structure ................................................................................... 5
Chapter 2. Programming Model ......................................................................... 7
2.1 Kernels ........................................................................................................ 7
2.2 Thread Hierarchy .......................................................................................... 8
2.3 Memory Hierarchy ...................................................................................... 10
2.4 Heterogeneous Programming ...................................................................... 11
2.5 Compute Capability .................................................................................... 14
Chapter 3. Programming Interface ................................................................. 15
3.1 Compilation with NVCC ............................................................................... 15
3.1.1 Compilation Workflow .......................................................................... 16
3.1.2 Binary Compatibility ............................................................................. 16
3.1.3 PTX Compatibility ................................................................................ 16
3.1.4 Application Compatibility ...................................................................... 17
3.1.5 C/C++ Compatibility ............................................................................ 18
3.2 CUDA C ..................................................................................................... 18
3.2.1 Device Memory .................................................................................... 18
3.2.2 Shared Memory ................................................................................... 21
3.2.3 Multiple Devices ................................................................................... 27
3.2.4 Texture Memory .................................................................................. 28
3.2.4.1 Texture Reference Declaration ....................................................... 29
3.2.4.2 Runtime Texture Reference Attributes ............................................ 29
3.2.4.3 Texture Binding ............................................................................ 30
3.2.5 Page-Locked Host Memory ................................................................... 33
3.2.5.1 Portable Memory ........................................................................... 33

iv CUDA Programming Guide Version 3.0
3.2.5.2 Write-Combining Memory .............................................................. 33
3.2.5.3 Mapped Memory ........................................................................... 33
3.2.6 Asynchronous Concurrent Execution ..................................................... 34
3.2.6.1 Concurrent Execution between Host and Device .............................. 34
3.2.6.2 Overlap of Data Transfer and Kernel Execution ............................... 35
3.2.6.3 Concurrent Kernel Execution .......................................................... 35
3.2.6.4 Concurrent Data Transfers ............................................................. 35
3.2.6.5 Stream ......................................................................................... 35
3.2.6.6 Event ........................................................................................... 37
3.2.6.7 Synchronous Calls ......................................................................... 37
3.2.7 Graphics Interoperability ...................................................................... 37
3.2.7.1 OpenGL Interoperability ................................................................ 38
3.2.7.2 Direct3D Interoperability ............................................................... 40
3.2.8 Error Handling ..................................................................................... 46
3.2.9 Debugging using the Device Emulation Mode ........................................ 47
3.3 Driver API .................................................................................................. 49
3.3.1 Context ............................................................................................... 51
3.3.2 Module ................................................................................................ 52
3.3.3 Kernel Execution .................................................................................. 52
3.3.4 Device Memory .................................................................................... 54
3.3.5 Shared Memory ................................................................................... 57
3.3.6 Multiple Devices ................................................................................... 58
3.3.7 Texture Memory .................................................................................. 58
3.3.8 Page-Locked Host Memory ................................................................... 60
3.3.9 Asynchronous Concurrent Execution ..................................................... 61
3.3.9.1 Stream ......................................................................................... 61
3.3.9.2 Event Management ....................................................................... 62
3.3.9.3 Synchronous Calls ......................................................................... 63
3.3.10 Graphics Interoperability ...................................................................... 63
3.3.10.1 OpenGL Interoperability ................................................................ 63
3.3.10.2 Direct3D Interoperability ............................................................... 65
3.3.11 Error Handling ..................................................................................... 72
3.4 Interoperability between Runtime and Driver APIs ........................................ 72

CUDA Programming Guide Version 3.0 v
3.5 Versioning and Compatibility ....................................................................... 73
3.6 Compute Modes ......................................................................................... 74
3.7 Mode Switches ........................................................................................... 74
Chapter 4. Hardware Implementation ............................................................ 77
4.1 SIMT Architecture ....................................................................................... 77
4.2 Hardware Multithreading ............................................................................. 78
4.3 Multiple Devices ......................................................................................... 79
Chapter 5. Performance Guidelines ................................................................. 81
5.1 Overall Performance Optimization Strategies ................................................ 81
5.2 Maximize Utilization .................................................................................... 81
5.2.1 Application Level .................................................................................. 81
5.2.2 Device Level ........................................................................................ 82
5.2.3 Multiprocessor Level ............................................................................ 82
5.3 Maximize Memory Throughput .................................................................... 84
5.3.1 Data Transfer between Host and Device ............................................... 85
5.3.2 Device Memory Accesses ...................................................................... 85
5.3.2.1 Global Memory .............................................................................. 86
5.3.2.2 Local Memory ............................................................................... 87
5.3.2.3 Shared Memory ............................................................................ 88
5.3.2.4 Constant Memory .......................................................................... 88
5.3.2.5 Texture Memory ........................................................................... 89
5.4 Maximize Instruction Throughput ................................................................ 89
5.4.1 Arithmetic Instructions ......................................................................... 90
5.4.2 Control Flow Instructions ..................................................................... 92
5.4.3 Synchronization Instruction .................................................................. 93
Appendix A. CUDA-Enabled GPUs .................................................................... 95
Appendix B. C Language Extensions ................................................................ 97
B.1 Function Type Qualifiers ............................................................................. 97
B.1.1 __device__ .......................................................................................... 97
B.1.2 __global__ .......................................................................................... 97
B.1.3 __host__ ............................................................................................. 97
B.1.4 Restrictions ......................................................................................... 98
B.2 Variable Type Qualifiers .............................................................................. 98
剩余164页未读,继续阅读













安全验证
文档复制为VIP权益,开通VIP直接复制

评论6