cuda并行计算给出一个32维度向量与1024个32维向量的欧式距离的实验结果并行程序
时间: 2023-06-20 07:03:42 浏览: 123
以下是一个使用CUDA并行计算32维度向量与1024个32维向量欧式距离的实验结果并行程序,程序使用了CUDA的并行计算能力,可以显著提高计算效率:
```cuda
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define N 32
#define M 1024
#define THREADS_PER_BLOCK 1024
__global__ void euclidean_distance(float *x, float *y, float *dist){
int idx = blockDim.x * blockIdx.x + threadIdx.x;
if(idx < M){
float sum = 0.0f;
for(int i=0; i<N; i++){
float diff = x[i] - y[idx*N+i];
sum += diff * diff;
}
dist[idx] = sqrt(sum);
}
}
int main(){
float *x, *y, *dist;
float *d_x, *d_y, *d_dist;
// Allocate memory on host
x = (float *)malloc(N * sizeof(float));
y = (float *)malloc(N * M * sizeof(float));
dist = (float *)malloc(M * sizeof(float));
// Initialize input vectors
for(int i=0; i<N; i++){
x[i] = i;
for(int j=0; j<M; j++){
y[j*N+i] = i+j;
}
}
// Allocate memory on device
cudaMalloc((void **)&d_x, N * sizeof(float));
cudaMalloc((void **)&d_y, N * M * sizeof(float));
cudaMalloc((void **)&d_dist, M * sizeof(float));
// Copy input data from host to device
cudaMemcpy(d_x, x, N * sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(d_y, y, N * M * sizeof(float), cudaMemcpyHostToDevice);
// Launch kernel
euclidean_distance<<<(M + THREADS_PER_BLOCK - 1) / THREADS_PER_BLOCK, THREADS_PER_BLOCK>>>(d_x, d_y, d_dist);
// Copy result from device to host
cudaMemcpy(dist, d_dist, M * sizeof(float), cudaMemcpyDeviceToHost);
// Print result
for(int i=0; i<M; i++){
printf("Dist[%d]: %.2f\n", i, dist[i]);
}
// Free memory
free(x);
free(y);
free(dist);
cudaFree(d_x);
cudaFree(d_y);
cudaFree(d_dist);
return 0;
}
```
程序中,`euclidean_distance`是CUDA的核函数,用于计算32维向量x和1024个32维向量y之间的欧式距离,并将结果存储在dist数组中。在主函数中,首先分配了x、y和dist数组的内存空间,并初始化输入向量。然后,程序在GPU上分配x、y和dist数组的内存空间,并将输入数据从主机内存复制到设备内存。接着,程序启动核函数,并将计算结果从设备内存复制回主机内存。最后,程序打印输出结果,并释放内存空间。
相关推荐
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)