没有合适的资源?快使用搜索试试~ 我知道了~
首页NVIDIA_CUDA_ProgrammingGuide3.0
资源详情
资源评论
资源推荐
Version 3.0
2/20/2010
NVIDIA CUDA™
Programming Guide
ii CUDA Programming Guide Version 3.0
CUDA Programming Guide Version 3.0 iii
Table of Contents
Chapter 1. Introduction ..................................................................................... 1
1.1 From Graphics Processing to General-Purpose Parallel Computing ................... 1
1.2 CUDA™: a General-Purpose Parallel Computing Architecture ........................... 3
1.3 A Scalable Programming Model ..................................................................... 4
1.4 Document’s Structure ................................................................................... 5
Chapter 2. Programming Model ......................................................................... 7
2.1 Kernels ........................................................................................................ 7
2.2 Thread Hierarchy .......................................................................................... 8
2.3 Memory Hierarchy ...................................................................................... 10
2.4 Heterogeneous Programming ...................................................................... 11
2.5 Compute Capability .................................................................................... 14
Chapter 3. Programming Interface ................................................................. 15
3.1 Compilation with NVCC ............................................................................... 15
3.1.1 Compilation Workflow .......................................................................... 16
3.1.2 Binary Compatibility ............................................................................. 16
3.1.3 PTX Compatibility ................................................................................ 16
3.1.4 Application Compatibility ...................................................................... 17
3.1.5 C/C++ Compatibility ............................................................................ 18
3.2 CUDA C ..................................................................................................... 18
3.2.1 Device Memory .................................................................................... 18
3.2.2 Shared Memory ................................................................................... 21
3.2.3 Multiple Devices ................................................................................... 27
3.2.4 Texture Memory .................................................................................. 28
3.2.4.1 Texture Reference Declaration ....................................................... 29
3.2.4.2 Runtime Texture Reference Attributes ............................................ 29
3.2.4.3 Texture Binding ............................................................................ 30
3.2.5 Page-Locked Host Memory ................................................................... 33
3.2.5.1 Portable Memory ........................................................................... 33
iv CUDA Programming Guide Version 3.0
3.2.5.2 Write-Combining Memory .............................................................. 33
3.2.5.3 Mapped Memory ........................................................................... 33
3.2.6 Asynchronous Concurrent Execution ..................................................... 34
3.2.6.1 Concurrent Execution between Host and Device .............................. 34
3.2.6.2 Overlap of Data Transfer and Kernel Execution ............................... 35
3.2.6.3 Concurrent Kernel Execution .......................................................... 35
3.2.6.4 Concurrent Data Transfers ............................................................. 35
3.2.6.5 Stream ......................................................................................... 35
3.2.6.6 Event ........................................................................................... 37
3.2.6.7 Synchronous Calls ......................................................................... 37
3.2.7 Graphics Interoperability ...................................................................... 37
3.2.7.1 OpenGL Interoperability ................................................................ 38
3.2.7.2 Direct3D Interoperability ............................................................... 40
3.2.8 Error Handling ..................................................................................... 46
3.2.9 Debugging using the Device Emulation Mode ........................................ 47
3.3 Driver API .................................................................................................. 49
3.3.1 Context ............................................................................................... 51
3.3.2 Module ................................................................................................ 52
3.3.3 Kernel Execution .................................................................................. 52
3.3.4 Device Memory .................................................................................... 54
3.3.5 Shared Memory ................................................................................... 57
3.3.6 Multiple Devices ................................................................................... 58
3.3.7 Texture Memory .................................................................................. 58
3.3.8 Page-Locked Host Memory ................................................................... 60
3.3.9 Asynchronous Concurrent Execution ..................................................... 61
3.3.9.1 Stream ......................................................................................... 61
3.3.9.2 Event Management ....................................................................... 62
3.3.9.3 Synchronous Calls ......................................................................... 63
3.3.10 Graphics Interoperability ...................................................................... 63
3.3.10.1 OpenGL Interoperability ................................................................ 63
3.3.10.2 Direct3D Interoperability ............................................................... 65
3.3.11 Error Handling ..................................................................................... 72
3.4 Interoperability between Runtime and Driver APIs ........................................ 72
CUDA Programming Guide Version 3.0 v
3.5 Versioning and Compatibility ....................................................................... 73
3.6 Compute Modes ......................................................................................... 74
3.7 Mode Switches ........................................................................................... 74
Chapter 4. Hardware Implementation ............................................................ 77
4.1 SIMT Architecture ....................................................................................... 77
4.2 Hardware Multithreading ............................................................................. 78
4.3 Multiple Devices ......................................................................................... 79
Chapter 5. Performance Guidelines ................................................................. 81
5.1 Overall Performance Optimization Strategies ................................................ 81
5.2 Maximize Utilization .................................................................................... 81
5.2.1 Application Level .................................................................................. 81
5.2.2 Device Level ........................................................................................ 82
5.2.3 Multiprocessor Level ............................................................................ 82
5.3 Maximize Memory Throughput .................................................................... 84
5.3.1 Data Transfer between Host and Device ............................................... 85
5.3.2 Device Memory Accesses ...................................................................... 85
5.3.2.1 Global Memory .............................................................................. 86
5.3.2.2 Local Memory ............................................................................... 87
5.3.2.3 Shared Memory ............................................................................ 88
5.3.2.4 Constant Memory .......................................................................... 88
5.3.2.5 Texture Memory ........................................................................... 89
5.4 Maximize Instruction Throughput ................................................................ 89
5.4.1 Arithmetic Instructions ......................................................................... 90
5.4.2 Control Flow Instructions ..................................................................... 92
5.4.3 Synchronization Instruction .................................................................. 93
Appendix A. CUDA-Enabled GPUs .................................................................... 95
Appendix B. C Language Extensions ................................................................ 97
B.1 Function Type Qualifiers ............................................................................. 97
B.1.1 __device__ .......................................................................................... 97
B.1.2 __global__ .......................................................................................... 97
B.1.3 __host__ ............................................................................................. 97
B.1.4 Restrictions ......................................................................................... 98
B.2 Variable Type Qualifiers .............................................................................. 98
剩余164页未读,继续阅读
ajiao05240625
- 粉丝: 7
- 资源: 33
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- RTL8188FU-Linux-v5.7.4.2-36687.20200602.tar(20765).gz
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
- SPC统计方法基础知识.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论6