没有合适的资源?快使用搜索试试~ 我知道了~
首页NVIDIA GPU图灵架构详解
资源详情
资源评论
资源推荐
WP-08608-001_v1.1 | August 2017
NVIDIA TESLA V100 GPU
ARCHITECTURE
THE WORLD’S MOST ADVANCED DATA CENTER GPU
The World’s Most Advanced Data Center GPU WP-08608-001_v1.1 | ii
WP-08608-001_v1.1
TABLE OF CONTENTS
Introduction to the NVIDIA Tesla V100 GPU Architecture ...................................... 1
Tesla V100: The AI Computing and HPC Powerhouse ............................................ 2
Key Features ................................................................................................... 2
Extreme Performance for AI and HPC ..................................................................... 5
NVIDIA GPUs The Fastest and Most Flexible Deep Learning Platform .................... 6
Deep Learning Background ................................................................................. 6
GPU-Accelerated Deep Learning ........................................................................... 7
GV100 GPU Hardware Architecture In-Depth ....................................................... 8
Extreme Performance and High Efficiency ............................................................... 11
Volta Streaming Multiprocessor ........................................................................... 12
Tensor Cores .............................................................................................. 14
Enhanced L1 Data Cache and Shared Memory ...................................................... 17
Simultaneous Execution of FP32 and INT32 Operations ........................................... 18
Compute Capability .......................................................................................... 18
NVLink: Higher bandwidth, More Links, More Features ............................................... 19
More Links, Faster Links ................................................................................. 19
More Features ............................................................................................. 19
HBM2 Memory Architecture ................................................................................ 21
ECC Memory Resiliency .................................................................................. 22
Copy Engine Enhancements ............................................................................... 23
Tesla V100 Board Design ................................................................................... 23
GV100 CUDA Hardware and Software Architectural Advances............................... 25
Independent Thread Scheduling .......................................................................... 26
Prior NVIDIA GPU SIMT Models ........................................................................ 26
Volta SIMT Model ......................................................................................... 27
Starvation-Free Algorithms .............................................................................. 29
Volta Multi-Process Service ................................................................................. 30
Unified Memory and Address Translation Services ..................................................... 32
Cooperative Groups ......................................................................................... 33
Conclusion .................................................................................................... 36
Appendix A NVIDIA DGX-1 with Tesla V100 ...................................................... 37
NVIDIA DGX-1 System Specifications .................................................................... 38
DGX-1 Software .............................................................................................. 39
Appendix B NVIDIA DGX Station - A Personal AI Supercomputer for Deep Learning 41
Preloaded with the Latest Deep Learning Software .................................................... 43
Kickstarting AI initiatives ................................................................................... 43
Appendix C Accelerating Deep Learning and Artificial Intelligence with GPUs ........ 44
Deep Learning in a Nutshell ................................................................................ 44
NVIDIA GPUs: The Engine of Deep Learning ........................................................... 47
Training Deep Neural Networks ........................................................................ 48
Inferencing Using a Trained Neural Network ........................................................ 49
The World’s Most Advanced Data Center GPU WP-08608-001_v1.1 | iii
Comprehensive Deep Learning Software Development Kit ........................................... 50
Self-driving Cars .......................................................................................... 51
Robots ...................................................................................................... 52
Healthcare and Life Sciences ........................................................................... 52
The World’s Most Advanced Data Center GPU WP-08608-001_v1.1 | iv
LIST OF FIGURES
Figure 1. NVIDIA Tesla V100 SXM2 Module with Volta GV100 GPU ....................... 1
Figure 2. New Technologies in Tesla V100 ................................................... 4
Figure 3. Tesla V100 Provides a Major Leap in Deep Learning Performance with New
Tensor Cores .......................................................................... 5
Figure 4. Volta GV100 Full GPU with 84 SM Units ........................................... 9
Figure 5. Volta GV100 Streaming Multiprocessor (SM) ..................................... 13
Figure 6. cuBLAS Single Precision (FP32) .................................................... 14
Figure 7. cuBLAS Mixed Precision (FP16 Input, FP32 Compute) .......................... 15
Figure 8. Tensor Core 4x4 Matrix Multiply and Accumulate ............................... 15
Figure 9. Mixed Precision Multiply and Accumulate in Tensor Core ...................... 16
Figure 10. Pascal and Volta 4x4 Matrix Multiplication ........................................ 16
Figure 11. Comparison of Pascal and Volta Data Cache ..................................... 17
Figure 12. Hybrid Cube Mesh NVLink Topology as used in DGX-1 with V100 ............ 20
Figure 13. V100 with NVLink Connected GPU-to-GPU and GPU-to-CPU ................... 20
Figure 14. Second Generation NVLink Performance ......................................... 21
Figure 15. HBM2 Memory Speedup on V100 vs P100 ....................................... 22
Figure 16. Tesla V100 Accelerator (Front) .................................................... 23
Figure 17. Tesla V100 Accelerator (Back) ..................................................... 24
Figure 18. NVIDIA Tesla V100 SXM2 Module - Stylized Exploded View ................... 24
Figure 19. Deep Learning Methods Developed Using CUDA ................................ 25
Figure 20. SIMT Warp Execution Model of Pascal and Earlier GPUs ....................... 26
Figure 21. Volta Warp with Per-Thread Program Counter and Call Stack ................. 27
Figure 22. Volta Independent Thread Scheduling ............................................ 28
Figure 23. Programs use Explicit Synchronization to Reconverge Threads in a Warp ... 28
Figure 24. Doubly Linked List with Fine-Grained Locks ...................................... 29
Figure 25. Software-based MPS Service in Pascal vs Hardware-Accelerated MPS Service
in Volta ................................................................................ 31
Figure 26. Volta MPS for Inference ............................................................. 32
Figure 27. Two Phases of a Particle Simulation............................................... 35
Figure 28. NVIDIA DGX-1 Server ............................................................... 37
The World’s Most Advanced Data Center GPU WP-08608-001_v1.1 | v
Figure 29. DGX-1 Delivers up to 3x Faster Training Compared to Eight-way GP100
Based Server ......................................................................... 38
Figure 30. NVIDIA DGX-1 Fully Integrated Software Stack for Instant Productivity ..... 40
Figure 31. Tesla V100 Powered DGX Station ................................................. 41
Figure 32. NVIDIA DGX Station Delivers 47x Faster Training ............................... 42
Figure 33. Perceptron is the Simplest Model of a Neural Network ......................... 45
Figure 34. Complex Multi-Layer Neural Network Models Require Increased Amounts of
Compute Power ...................................................................... 47
Figure 35. Training a Neural Network .......................................................... 48
Figure 36. Inferencing on a Neural Network .................................................. 49
Figure 37. Accelerate Every Framework ....................................................... 50
Figure 38. Organizations Engaged with NVIDIA on Deep Learning ........................ 51
Figure 39. NVIDIA DriveNet ..................................................................... 52
LIST OF TABLES
Table 1. Comparison of NVIDIA Tesla GPUs ................................................ 10
Table 2. Compute Capabilities: GK180 vs GM200 vs GP100 vs GV100 ................. 18
Table 3. NVIDIA DGX-1 System Specifications ............................................. 38
Table 4. DGX Station Specifications.......................................................... 42
剩余57页未读,继续阅读
dengxf01
- 粉丝: 39
- 资源: 66
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- RTL8188FU-Linux-v5.7.4.2-36687.20200602.tar(20765).gz
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
- SPC统计方法基础知识.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论1