C++ AMP：DirectCompute程序员的新选择

需积分: 10 119 浏览量更新于2024-09-09 收藏 735KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"C++ AMP是微软推出的一种并行计算框架，旨在为DirectCompute程序员提供一个更易用且高效的编程模型。C++ AMP将C++语言与DirectCompute的GPU计算能力结合，允许开发者在单一的C++源文件中编写主机代码和设备代码，消除了传统DirectCompute中主机代码（通常是C或C++）与设备代码（C-like的HLSL内核代码）之间的文件和扩展名差异。这使得项目管理更为简洁，只需要一个编译器就能生成单一的二进制文件。 C++ AMP引入了两个新的语言特性：一是`tile_static`存储类，对应于HLSL中的概念，用于在共享内存中创建数据存储，以实现GPU线程组间的通信和协作；二是模板化的并行算法库，提供了类似于STL的接口，如`concurrency::array_view`和`parallel_for_each`，方便地在GPU上执行并行计算任务。在DirectCompute中，通常需要分别处理主机代码和设备代码，它们可能存在于不同的文件中，并且设备代码（HLSL）可能需要预先编译成两种二进制形式。然而，在C++ AMP中，由于所有代码都使用C++编写，可以统一在一个文件中，减少了编译和管理的复杂性。这并不意味着HLSL完全被替代，而是通过C++ AMP，开发者可以使用C++的语法和抽象来描述GPU计算，简化了编程流程。 C++ AMP的`tile_static`关键字使得开发者能够在GPU的计算单元之间进行更灵活的数据共享，提高了并行计算的效率。这种存储类允许局部数据在同一个计算单元内的线程之间共享，类似于OpenMP的threadprivate或CUDA的shared memory。另外，C++ AMP的并行算法库借鉴了C++标准模板库（STL）的设计，提供了一种熟悉的编程范式。`concurrency::array_view`对象是对GPU内存的抽象，它不是一个拷贝，而是对底层数据的引用，支持并行访问。`parallel_for_each`函数则是一个并行执行的迭代器，可以用来遍历数组并执行自定义操作，非常适合大规模数据并行计算。 C++ AMP为熟悉C++的DirectCompute程序员提供了一个更加集成、高效的开发环境，降低了GPU编程的门槛，同时也保持了高性能计算的能力。通过利用C++ AMP，开发者可以更好地发挥现代多核GPU的潜力，实现高效的并行计算应用。"

资源详情

资源推荐

desc.StructureByteStride = sizeof(float);

D3D11_SHADER_RESOURCE_VIEW_DESC srvDesc;

ZeroMemory(&srvDesc, sizeof(srvDesc));

srvDesc.ViewDimension = D3D11_SRV_DIMENSION_BUFFEREX;

srvDesc.Format = DXGI_FORMAT_UNKNOWN;

srvDesc.BufferEx.NumElements = size * size;

D3D11_UNORDERED_ACCESS_VIEW_DESC uavDesc;

ZeroMemory(&uavDesc, sizeof(uavDesc));

uavDesc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;

uavDesc.Format = DXGI_FORMAT_UNKNOWN;

uavDesc.Buffer.NumElements = size * size;

D3D11_SUBRESOURCE_DATA InitData;

ID3D11Buffer *d_A;

InitData.pSysMem = A;

hr = device->CreateBuffer(&desc, &InitData, &d_A);

ID3D11ShaderResourceView *d_A_SRV;

hr = device->CreateShaderResourceView(d_A, &srvDesc, &d_A_SRV);

ID3D11Buffer *d_B;

InitData.pSysMem = B;

hr = device->CreateBuffer(&desc, &InitData, &d_B);

ID3D11ShaderResourceView *d_B_SRV;

hr = device->CreateShaderResourceView(d_B, &srvDesc, &d_B_SRV);

ID3D11Buffer *d_C;

hr = device->CreateBuffer(&desc, NULL, &d_C);

ID3D11UnorderedAccessView *d_C_UAV;

hr = device->CreateUnorderedAccessView(d_C, &uavDesc, &d_C_UAV);

array_view<const float, 2> d_A(size, size, A);

array_view<const float, 2> d_B(size, size, B);

array_view<float, 2> d_C(size, size, C);

d_C.discard_data();

In DirectCompute, the variable size must be passed to the kernel by placing it in a constant buffer, padded to be a

multiple of four words. Although in this particular case, a structure is not strictly necessary, it is good form because

it is necessary to pass multiple parameters. The views also must be passed to the kernel explicit. In C++ AMP, as we

will see below, the variable size and the array_view objects are implicitly captured from the outer scope by value.

struct ConstantBufferStruct

{

int size, padding[3];

};

ZeroMemory(&desc, sizeof(desc));

desc.ByteWidth = sizeof(ConstantBufferStruct);

desc.Usage = D3D11_USAGE_DEFAULT;

desc.BindFlags = D3D11_BIND_CONSTANT_BUFFER;

ID3D11Buffer *constantBuffer;

hr = device->CreateBuffer(&desc, NULL, &constantBuffer);

ConstantBufferStruct constantValues = { size };

deviceContext->UpdateSubresource(constantBuffer, 0, NULL, &constantValues, 0, 0);

deviceContext->CSSetConstantBuffers(0, 1, &constantBuffer);

ID3D11UnorderedAccessView* rw_views[1] = { d_C_UAV };

deviceContext->CSSetUnorderedAccessViews(0, 1, rw_views, NULL);

ID3D11ShaderResourceView* ro_views[2] = { d_A_SRV, d_B_SRV };

deviceContext->CSSetShaderResources(0, 2, ro_views);

To launch the computation in C++ AMP, we use a parallel_for_each looping construct similar in form to

剩余10页未读，继续阅读

poiu0000

粉丝: 1
资源: 6

C++ AMP：DirectCompute程序员的新选择

C++ AMP.pdf

C++ AMP源码和一本书

使用DirectCompute 实现数字滤波

Write the following code in c #, Write a program to compute the (x, y) pairs for the equation y = 2x2 –x - 6 for x in the range 1 to 5 in 0.5 increments.

Write the following code using c #, Write a program to compute the (x, y) pairs for the equation y = 2x2 –x - 6 for x in the range 1 to 5 in 0.5 increments.

Write the following code in c #, Write a program to compute the (x, y) pairs for the equation y = 2x2 –x - 6 for x in the range 1 to 5 in 0.5 increments.

RuntimeError: CUDA error: no kernel image is available for execution on the device

c++将hog.compute函数可视化

compute the Gauss coefficients of the MF, SV and SA of the internal field

c++中descriptor->compute(img_1, keypoints_1, descriptors_1);keypoints_1和descriptors_1都是指针吗？

write a python code to compute the local order parameter of a system in gromacs

C++ AMP 用Visual C++加速大规模并行计算

最新资源