Nvidia2020安培架构GPU特性介绍_nvidia历代显卡架构

需积分: 37 121 浏览量更新于2023-05-13 评论收藏 7.41MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

NVIDIA A100 Tensor Core GPU Architecture UNPRECEDENTED ACCELERATION AT EVERY SCALE Introduction The diversity of compute-intensive applications running in modern cloud data centers has driven the explosion of NVIDIA GPU-accelerated cloud computing. Such intensive applications include AI deep learning training and inference, data analytics, scientific computing, genomics, edge video analytics and 5G services, graphics rendering, cloud gaming, and many more. From scaling-up AI training and scientific computing, to scaling-out inference applications, to enabling real-time conversational AI, NVIDIA GPUs provide the necessary horsepower to accelerate numerous complex and unpredictable workloads running in today’s cloud data centers. NVIDIA® GPUs are the leading computational engines powering the AI revolution, providing tremendous speedups for AI training and inference workloads. In addition, NVIDIA GPUs accelerate many types of HPC and data analytics applications and systems, allowing customers to effectively analyze, visualize, and turn data into insights. NVIDIA’s accelerated computing platforms are central to many of the world’s most important and fastest-growing industries. HPC has grown beyond supercomputers running computationally-intensive applications such as weather forecasting, oil & gas exploration, and financial modeling. Today, millions of NVIDIA GPUs are accelerating many types of HPC applications running in cloud data centers, servers, systems at the edge, and even deskside workstations, servicing hundreds of industries and scientific domains. AI networks continue to grow in size, complexity, and diversity, and the usage of AI-based applications and services is rapidly expanding. NVIDIA GPUs accelerate numerous AI systems and applications including: deep learning recommendation systems, autonomous machines (self-driving cars, factory robots, etc.), natural language processing (conversational AI, real-time language translation, etc.), smart city video analytics, software-defined 5G networks (that can deliver AI-based services at the Edge), molecular simulations, drone control, medical image analysis, and more.

资源详情

资源评论

资源推荐

V1.0

NVIDIA A100 Tensor Core GPU

Architecture

UNPRECEDENTED ACCELERATION AT EVERY SCALE

NVIDIA A100 Tensor Core GPU Architecture

List of Figures

Figure 1. Modern cloud datacenter workloads require NVIDIA GPU acceleration ................... 8

Figure 2. New Technologies in NVIDIA A100....................................................................... 10

Figure 3. NVIDIA A100 GPU on new SXM4 Module ............................................................ 12

Figure 4. Unified AI Acceleration for BERT-LARGE Training and Inference .......................... 13

Figure 5. A100 GPU HPC application speedups compared to NVIDIA Tesla V100 ............... 14

Figure 6. GA100 Full GPU with 128 SMs (A100 Tensor Core GPU has 108 SMs) ................ 20

Figure 7. GA100 Streaming Multiprocessor (SM) ................................................................. 22

Figure 8. A100 vs V100 Tensor Core Operations................................................................. 25

Figure 9. TensorFloat-32 (TF32) ......................................................................................... 27

Figure 10. Iterations of TCAIRS Solver to Converge to FP64 Accuracy .............................. 30

Figure 11. TCAIRS solver speedup over the baseline FP64 direct solver............................ 30

Figure 12. A100 Fine-Grained Structured Sparsity ............................................................. 32

Figure 13. Example Dense MMA and Sparse MMA operations........................................... 33

Figure 14. A100 Tensor Core Throughput and Efficiency ................................................... 40

Figure 15. A100 SM Data Movement Efficiency ................................................................. 41

Figure 16. A100 L2 cache residency controls ..................................................................... 42

Figure 17. A100 Compute Data Compression .................................................................... 42

Figure 18. A100 strong-scaling innovations........................................................................ 43

Figure 19. Software-based MPS in Pascal vs Hardware-Accelerated MPS in Volta............. 45

Figure 20. CSP Multi-user node Today .............................................................................. 47

Figure 21. Example CSP MIG Conf iguration ...................................................................... 48

Figure 22. Example MIG compute configuration with three GPU Instances. ........................ 49

Figure 23. MIG Configuration with multiple independent GPU Compute workloads ............. 50

Figure 24. Example MIG partitioning process ..................................................................... 51

Figure 25. Example MIG config with three GPU Instances and four Compute Instances. .... 52

Figure 26. NVIDIA DGX A100 with Eight A100 GPUs......................................................... 54

Figure 27. Illustration of optical f low and stereo disparity .................................................... 56

Figure 28. Execution Breakdown for Sequential 2us Kernels. ............................................. 60

Figure 29. Impact of Task Graph acceleration on CPU launch latency ................................ 61

Figure 30. Grid-to-Grid Latency Speedup using CUDA graphs ........................................... 62

Figure 31. A100 Asynchronous Copy vs No Asynchronous Copy ....................................... 63

Figure 32. Synchronous vs Asynchronous Copy to Shared Memory ................................... 64

Figure 33. A100 Asynchronous Barriers............................................................................. 65

Figure 34. A100 L2 residency control example................................................................... 67

Figure 35. Warp-Wide Reduction ....................................................................................... 68

Figure 36. NVIDIA DGX 100 System ................................................................................. 70

Figure 37. DGX A100 Delivers unprecedented AI performance for training and inference. .. 71

Figure 38. NVIDIA DGX Software Stack ............................................................................ 73

Figure 39. Dense Neural Network ...................................................................................... 77

Figure 40. Fine-Grained Sparsity ....................................................................................... 79

Figure 41. Coarse Grained Sparsity................................................................................... 80

Figure 42. Fine Grained Structured Sparsity ...................................................................... 81

剩余82页未读，继续阅读

KarlLok

粉丝: 4
资源: 11

会员权益专享

Nvidia 2020 安培架构GPU特性介绍

评论0

会员权益专享

最新资源

Nvidia 2020 安培架构GPU特性介绍

评论0

Nivida Volta 架构白皮书，中文版

NVIDIA 安培 GA102 GPU 建筑学.pdf

“最甜”安培核心GPU带来了什么从RTX 3060发布看NVIDIA的“野心.pdf

支持图灵架构和安培架构的TensorFlow Python库

安培架构的 GPU是什么意思

图灵架构和安培架构的精度区别

西门子3极1安培的B特性交流空开的表述

西门子3极1安培的B特性交流空开

tesla t4 和rtx3090

jetson tx2额定工作电流

安培服务器和ARM服务器的区别

什么是安培环路定理？

rtx2060与rtx3060ti的差别

rtx2060s与rtx3060ti的差别

安培积分法计算电池容量

100.6安培需要多大的三相电缆线

英伟达显卡2080、3070、3080、A4000、A5000

输出电流为10A超级电容放电电压特性曲线MATLAB代码

RTX2050 cuda

输出电流为10A超级电容放电特性曲线MATLAB代码

会员权益专享

最新资源