
v
NVIDIA A100 Tensor Core GPU Architecture
List of Figures
Figure 1. Modern cloud datacenter workloads require NVIDIA GPU acceleration ................... 8
Figure 2. New Technologies in NVIDIA A100....................................................................... 10
Figure 3. NVIDIA A100 GPU on new SXM4 Module ............................................................ 12
Figure 4. Unified AI Acceleration for BERT-LARGE Training and Inference .......................... 13
Figure 5. A100 GPU HPC application speedups compared to NVIDIA Tesla V100 ............... 14
Figure 6. GA100 Full GPU with 128 SMs (A100 Tensor Core GPU has 108 SMs) ................ 20
Figure 7. GA100 Streaming Multiprocessor (SM) ................................................................. 22
Figure 8. A100 vs V100 Tensor Core Operations................................................................. 25
Figure 9. TensorFloat-32 (TF32) ......................................................................................... 27
Figure 10. Iterations of TCAIRS Solver to Converge to FP64 Accuracy .............................. 30
Figure 11. TCAIRS solver speedup over the baseline FP64 direct solver............................ 30
Figure 12. A100 Fine-Grained Structured Sparsity ............................................................. 32
Figure 13. Example Dense MMA and Sparse MMA operations........................................... 33
Figure 14. A100 Tensor Core Throughput and Efficiency ................................................... 40
Figure 15. A100 SM Data Movement Efficiency ................................................................. 41
Figure 16. A100 L2 cache residency controls ..................................................................... 42
Figure 17. A100 Compute Data Compression .................................................................... 42
Figure 18. A100 strong-scaling innovations........................................................................ 43
Figure 19. Software-based MPS in Pascal vs Hardware-Accelerated MPS in Volta............. 45
Figure 20. CSP Multi-user node Today .............................................................................. 47
Figure 21. Example CSP MIG Conf iguration ...................................................................... 48
Figure 22. Example MIG compute configuration with three GPU Instances. ........................ 49
Figure 23. MIG Configuration with multiple independent GPU Compute workloads ............. 50
Figure 24. Example MIG partitioning process ..................................................................... 51
Figure 25. Example MIG config with three GPU Instances and four Compute Instances. .... 52
Figure 26. NVIDIA DGX A100 with Eight A100 GPUs......................................................... 54
Figure 27. Illustration of optical f low and stereo disparity .................................................... 56
Figure 28. Execution Breakdown for Sequential 2us Kernels. ............................................. 60
Figure 29. Impact of Task Graph acceleration on CPU launch latency ................................ 61
Figure 30. Grid-to-Grid Latency Speedup using CUDA graphs ........................................... 62
Figure 31. A100 Asynchronous Copy vs No Asynchronous Copy ....................................... 63
Figure 32. Synchronous vs Asynchronous Copy to Shared Memory ................................... 64
Figure 33. A100 Asynchronous Barriers............................................................................. 65
Figure 34. A100 L2 residency control example................................................................... 67
Figure 35. Warp-Wide Reduction ....................................................................................... 68
Figure 36. NVIDIA DGX 100 System ................................................................................. 70
Figure 37. DGX A100 Delivers unprecedented AI performance for training and inference. .. 71
Figure 38. NVIDIA DGX Software Stack ............................................................................ 73
Figure 39. Dense Neural Network ...................................................................................... 77
Figure 40. Fine-Grained Sparsity ....................................................................................... 79
Figure 41. Coarse Grained Sparsity................................................................................... 80
Figure 42. Fine Grained Structured Sparsity ...................................................................... 81
评论0