掌握GPGPU编程：游戏与科学的高性能计算实战指南

gpu

3星 · 超过75%的资源需积分: 10 42 浏览量更新于2024-07-20 收藏 5.17MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

资源详情

资源推荐

List of Tables

2.1 The binary encodings for 8-bit ﬂoating-point numbers . . . 35

2.2 Quantities of interest for

binary8 ................ 36

2.3 Quantities of interest for

binary16 ............... 38

2.4 Quantities of interest for

binary32 ............... 40

2.5 Quantities of interest for

binary64 ............... 43

3.1 SIMDcomparisonoperators .................. 99

3.2 SIMDarithmeticoperators................... 100

3.3 Inverse square root accuracy and performance . . . . . . . . 111

3.4 Minimax polynomial approximations to

√

1+x ....... 115

3.5 Minimax polynomial approximations to f(x)=1/

√

1+x . . 115

3.6 Minimax polynomial approximations to f(x)=sin(x) . . . . 117

3.7 Minimax polynomial approximations to f(x)=cos(x) . . . 117

3.8 Minimax polynomial approximations to f(x)=tan(x) . . . 118

3.9 Minimax polynomial approximations to f(x) = asin(x) . . . 118

3.10 Minimax polynomial approximations to f(x)=(π/2 −

asin(x))/

√

1 − x ......................... 119

3.11 Minimax polynomial approximations to f (x) = atan(x) . . . 120

3.12 Minimax polynomial approximations to f (x)=2

...... 121

3.13 Minimax polynomial approximations to f (x) = log

(1 + x) . 121

4.1 Thetransformationpipeline .................. 130

5.1 Vertex and pixel shader performance measurements . . . . . 246

5.2 Compute shader performance measurements . . . . . . . . . 247

5.3 Depth, stencil, and culling state performance measurements 248

6.1 Error balancing for several n in the Remez algorithm . . . . 304

6.2 Rotationconventions...................... 336

7.1 Numerical ill conditioning for least squares . . . . . . . . . . 358

7.2 Performance comparisons for convolution implementations . 382

http://freepdf-books.com

Listings

2.1 Inexact representation of ﬂoating-point inputs . . . . . . . . . 13

2.2 Simple implementation for computing distance between two

points ............................... 18

2.3 Incorrect distance computation due to input problems . . . . 18

2.4 Conversion of rational numbers to binary scientiﬁc numbers . 26

2.5 A union is used to allow accessing a ﬂoating-point number or

manipulating its bits via an unsigned integer . . . . . . . . . 32

2.6 Decoding an 8-bit ﬂoating-point number . . . . . . . . . . . . 33

2.7 Decoding a 16-bit ﬂoating-point number . . . . . . . . . . . . 37

2.8 Decoding a 32-bit ﬂoating-point number . . . . . . . . . . . . 39

2.9 Decoding a 64-bit ﬂoating-point number . . . . . . . . . . . . 41

2.10 Integer and unsigned integer quantities that are useful for en-

coding and decoding ﬂoating-point numbers . . . . . . . . . . 44

2.11 The general decoding of ﬂoating-point numbers . . . . . . . . 44

2.12 Convenient wrappers for processing encodings of ﬂoating-point

numbers.............................. 45

2.13 Classiﬁcation of ﬂoating-point numbers . . . . . . . . . . . . . 45

2.14 Queries about ﬂoating-point numbers . . . . . . . . . . . . . . 46

2.15 An implementation of the nextUp(x)function......... 48

2.16 An implementation of the nextDown(x)function ....... 49

2.17 An implementation of rounding with ties-to-even . . . . . . . 55

2.18 An implementation of rounding with ties-to-away . . . . . . . 56

2.19 An implementation of rounding toward zero . . . . . . . . . . 57

2.20 An implementation of rounding toward positive . . . . . . . . 58

2.21 An implementation of rounding toward negative . . . . . . . 59

2.22 Conversion of a 32-bit signed integer to a 32-bit ﬂoating-point

number .............................. 62

2.23 Conversion from a 32-bit ﬂoating-point number to a rational

number .............................. 65

2.24 Conversion from a 64-bit ﬂoating-point number to a rational

number .............................. 66

2.25 Conversion from a rational number to a 32-bit ﬂoating-point

number .............................. 68

2.26 Conversion of an 8-bit ﬂoating-point number to a 16-bit

ﬂoating-pointnumber ...................... 71

xvii

http://freepdf-books.com

xviii Listings

2.27 Conversion of a narrow ﬂoating-point format to a wide ﬂoating-

pointformat............................ 72

2.28 Conversion from a wide ﬂoating-point format to a narrow

ﬂoating-pointformat....................... 76

2.29 The conversion of a wide-format number to a narrower format 79

2.30 Correctly rounded result for square root . . . . . . . . . . . . 82

2.31 The standard mathematics library functions . . . . . . . . . . 83

2.32 Subtractive cancellation in ﬂoating-point arithmetic . . . . . 84

2.33 Another example of subtractive cancellation and how bad it

canbe............................... 86

2.34 Numerically incorrect quadratic roots when using the modiﬁed

quadraticformula......................... 87

2.35 An example of correct root ﬁnding, although at ﬁrst glance they

lookincorrect........................... 89

2.36 The example of Listing 2.35 but computed using double-

precisionnumbers ........................ 90

3.1 Computing a dot product of 4-tuples using SSE2 . . . . . . . 94

3.2 Computing the matrix-vector product as four row-vector dot

productsinSSE2......................... 101

3.3 Computing the matrix-vector product as a linear combination

ofcolumnsinSSE2........................ 101

3.4 Computing the matrix-vector product as four row-vector dot

productsinSSE4.1........................ 102

3.5 Transpose of a 4 × 4matrixusingshuﬄing .......... 103

3.6 Normalizing a vector using SSE2 with a break in the pipeline 103

3.7 Normalizing a vector using SSE2 without a break in the pipeline 104

3.8 The deﬁnition of the

Select function for ﬂattening branches . 104

3.9 Flatteningasinglebranch.................... 105

3.10 Flattening a two-level branch where the outer-then clause has

anestedbranch.......................... 105

3.11 Flattening a two-level branch where the outer-else clause has a

nestedbranch........................... 105

3.12 Flattening a two-level branch where the outer clauses have

nestedbranches.......................... 106

3.13 A fast approximation to

1/sqrt(x) for 32-bit ﬂoating-point . . 110

3.14 A fast approximation to

1/sqrt(x) for 64-bit ﬂoating-point . . 111

3.15 One Remez iteration for updating the locations of the local

extrema .............................. 112

4.1 A vertex shader and a pixel shader for simple vertex coloring

ofgeometricprimitives...................... 134

4.2 A vertex shader and a pixel shader for simple texturing of ge-

ometricprimitives ........................ 136

4.3 HLSL code to draw square billboards . . . . . . . . . . . . . . 139

4.4 A compute shader that implements small-scale Gaussian blur-

ring ................................ 141

http://freepdf-books.com

Listings xix

4.5 The output assembly listing for the vertex shader of

VertexColoring.hlsl for row-major matrix storage . . . . . . . . . 147

4.6 The output assembly listing for the matrix-vector product of

the vertex shader of

VertexColoring.hlsl for column-major matrix

storage............................... 150

4.7 The output assembly listing for the pixel shader of

VertexColoring.hlsl

150

4.8 The output assembly listing for the pixel shader of

Texturing.hlsl 151

4.9 The output assembly listing for the vertex shader of

Billboards.hlsl ............................ 152

4.10 The output assembly listing for the geometry shader of

Billboards.hlsl ............................ 153

4.11 The output assembly listing for the pixel shader of

Billboards.hlsl 155

4.12 The output assembly listing for the compute shader of

GaussianBlurring.hlsl ........................ 156

4.13 The output assembly listing for the compute shader of

GaussianBlurring.hlsl with loop unrolling . . . . . . . . . . . . . 158

4.14 The signature for the

D3DCompile function........... 160

4.15 The signature for the

D3DReﬂect function ........... 160

4.16 Compile an HLSL program at runtime and start the shader

reﬂectionsystem ......................... 160

4.17 An example of nested structs for which constant buﬀers have

one member layout but structured buﬀers have another member

layout ............................... 162

4.18 A modiﬁed listing of the

FXC output from the compute shader

ofListing4.17........................... 163

4.19 The non-default-value members of

D3D11 SHADER DESC for the

computeshaderofListing4.17 ................. 165

4.20 Descriptions about the constant buﬀers in the compute shader

ofListing4.17........................... 165

4.21 Creating a swap chain for displaying graphics data to a window 170

4.22Creatingabackbuﬀer ...................... 171

4.23 Common code for setting the usage and CPU access for a de-

scriptionstructure ........................ 175

4.24 The description for a shader resource view and the code to

createtheview .......................... 176

4.25 The description for an unordered access view and the code to

createtheview .......................... 176

4.26 The descriptions for render target and depth-stencil views and

thecodetocreatetheviews................... 177

4.27 Common code for creating an

ID3D11Buﬀer object....... 180

4.28Creatingaconstantbuﬀer.................... 181

4.29Creatingatexturebuﬀer..................... 181

4.30Creatingavertexbuﬀer ..................... 182

www.allitebooks.com

http://freepdf-books.com

剩余463页未读，继续阅读

yuripa1018

粉丝: 0
资源: 4

掌握GPGPU编程：游戏与科学的高性能计算实战指南

GPGPU Programming for Games and Science 无水印pdf

GPGPU Programming for Games and Science

gpgpusim跑opencl

ubuntu16.04下载gpgpu sim

centos7 gpgpu-sim

gpgpu-sim仿真环境搭建

GPGPU和DPU的区别

gpgpu-sim环境搭建教程

ai芯片分类 gpgpu

gdb调试gpgpu-sim

download gpgpu编程模型与架构原理

ffmpeg gpgpu

AI 加速器和GPGPU有什么区别

glsl 的 GPGPU 教程

-gpgpu "1 type=NVIDIAGeForceRTX3090 gmem=20480

marching cubes gpgpu

英伟达最强的GPGPU

if [ "x$GPGPUSIM_CONFIG" = "x" ];

GPU、GPGPU、CUDA/OpenCL/ROCm

gpgpusim虚拟机

最新资源