没有合适的资源?快使用搜索试试~ 我知道了~
首页CUDA编程指南5.0中文版
CUDA编程指南5.0中文版
5星 · 超过95%的资源 需积分: 50 41 下载量 31 浏览量
更新于2023-03-16
评论 3
收藏 1.88MB PDF 举报
CUDA编程指南5.0中文版,一种通用并行计算架构,CUDA包含一个让开发者能够使用C作为高级编程语言的软件环境。
资源详情
资源评论
资源推荐
CUDA编编编 程程程 指指指 南南南 5.0中中中 文文文 版版版
风风风 辰辰辰
目目目 录录录
目录 ···································································· i
第一章 导论 ··························································· 1
1.1 从图形处理到通用并行计算 ····································· 1
1.2 CUDA
TM
:一种通用并行计算架构 ······························ 3
1.3 一种可扩展的编程模型 ·········································· 3
1.4 文档结构 ······················································· 4
第二章 编程模型 ······················································· 7
2.1 内核 ···························································· 7
2.2 线程层次 ······················································· 8
2.3 存储器层次 ····················································· 11
2.4 异构编程 ······················································· 11
2.5 计算能力 ······················································· 11
第三章 编程接口 ······················································· 15
3.1 用n v c c 编译 ····················································· 15
3.1.1 编译流程 ················································ 16
3.1.1.1 离线编译 ······································· 16
3.1.1.2 即时编译 ······································· 16
3.1.2 二进制兼容性············································ 17
3.1.3 PTX兼容性·············································· 17
3.1.4 应用兼容性 ·············································· 18
3.1.5 C/C++兼容性 ··········································· 19
3.1.6 64位兼容性 ·············································· 19
3.2 CUDA C运行时················································· 19
ii CUDA编程指南5.0中文版
3.2.1 初始化 ·················································· 20
3.2.2 设备存储器 ·············································· 20
3.2.3 共享存储器 ·············································· 24
3.2.4 分页锁定主机存储器 ····································· 32
3.2.4.1 可分享存储器(portable memory) ················ 34
3.2.4.2 写结合存储器··································· 34
3.2.4.3 被映射存储器··································· 34
3.2.5 异步并发执行············································ 35
3.2.5.1 主机和设备间异步执行·························· 35
3.2.5.2 数据传输和内核执行重叠 ······················· 36
3.2.5.3 并发内核执行··································· 36
3.2.5.4 并发数据传输··································· 36
3.2.5.5 流·············································· 37
3.2.5.6 事件············································ 41
3.2.5.7 同步调用 ······································· 42
3.2.6 多设备系统 ·············································· 42
3.2.6.1 枚举设备 ······································· 42
3.2.6.2 设备指定 ······································· 42
3.2.6.3 流和事件行为··································· 43
3.2.6.4 p2p存储器访问 ································· 44
3.2.6.5 p2p存储器复制 ································· 45
3.2.6.6 统一虚拟地址空间 ······························ 45
3.2.6.7 错误检查 ······································· 46
3.2.7 调用栈 ·················································· 47
3.2.8 纹理和表面存储器 ······································· 47
3.2.8.1 纹理存储器 ····································· 47
3.2.8.2 表面存储器(surface) ···························· 60
3.2.8.3 CUDA 数组 ···································· 65
目录 iii
3.2.8.4 读写一致性 ····································· 66
3.2.9 图形学互操作性 ········································· 66
3.2.9.1 OpenGL互操作性 ······························· 67
3.2.9.2 Direct3D互操作性 ······························ 70
3.2.9.3 SLI(速力)互操作性 ··························· 82
3.3 版本和兼容性 ··················································· 82
3.4 计算模式 ······················································· 83
3.5 模式切换 ······················································· 84
3.6 Windows上的Tesla计算集群模式 ································ 85
第四章 硬件实现 ······················································· 87
4.1 SIMT 架构 ····················································· 87
4.2 硬件多线程 ····················································· 88
第五章 性能指南 ······················································· 91
5.1 总体性能优化策略 ·············································· 91
5.2 最大化利用率 ··················································· 91
5.2.1 应用层次 ················································ 91
5.2.2 设备层次 ················································ 92
5.2.3 多处理器层次············································ 92
5.3 最大化存储器吞吐量 ············································ 94
5.3.1 主机和设备的数据传输··································· 95
5.3.2 设备存储器访问 ········································· 96
5.3.2.1 全局存储器 ····································· 96
5.3.2.2 本地存储器 ····································· 98
5.3.2.3 共享存储器 ····································· 99
5.3.2.4 常量存储器 ····································· 100
5.3.2.5 纹理和表面存储器 ······························ 100
5.4 最大化指令吞吐量 ·············································· 100
剩余239页未读,继续阅读
zy20150613
- 粉丝: 41
- 资源: 21
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
- SPC统计方法基础知识.pptx
- MW全能培训汽轮机调节保安系统PPT教学课件.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论5