给出下列代码在OpenCL中的运行结果:#include "stdio.h" #include <xmmintrin.h> // Need this for SSE compiler intrinsics #include <math.h> // Needed for sqrt in CPU-only version #include <time.h> int main(int argc, char* argv[]) { printf("Starting calculation...\n"); const int length = 64000; // We will be calculating Y = SQRT(x) / x, for x = 1->64000 // If you do not properly align your data for SSE instructions, you may take a huge performance hit. float *pResult = (float*) _aligned_malloc(length * sizeof(float), 16); // align to 16-byte for SSE __m128 x; __m128 xDelta = _mm_set1_ps(4.0f); // Set the xDelta to (4,4,4,4) __m128 *pResultSSE = (__m128*) pResult; const int SSELength = length / 4; clock_t clock1=clock(); #define TIME_SSE // Define this if you want to run with SSE #ifdef TIME_SSE // lots of stress loops so we can easily use a stopwatch for (int stress = 0; stress < 1000; stress++) { // Set the initial values of x to (4,3,2,1) x = _mm_set_ps(4.0f, 3.0f, 2.0f, 1.0f); for (int i=0; i < SSELength; i++) { __m128 xSqrt = _mm_sqrt_ps(x); // Note! Division is slow. It's actually faster to take the reciprocal of a number and multiply // Also note that Division is more accurate than taking the reciprocal and multiplying #define USE_DIVISION_METHOD #ifdef USE_FAST_METHOD __m128 xRecip = _mm_rcp_ps(x); pResultSSE[i] = _mm_mul_ps(xRecip, xSqrt); #endif //USE_FAST_METHOD #ifdef USE_DIVISION_METHOD pResultSSE[i] = _mm_div_ps(xSqrt, x); #endif // USE_DIVISION_METHOD // Advance x to the next set of numbers x = _mm_add_ps(x, xDelta); } } clock_t clock2=clock(); printf("SIMDtime:%d ms\n",1000*(clock2-clock1)/CLOCKS_PER_SEC); #endif // TIME_SSE #define TIME_NoSSE #ifdef TIME_NoSSE clock_t clock3=clock(); // lots of stress loops so we can easily use a stopwatch for (int stress = 0; stress < 1000; stress++) { clock_t clock3=clock(); float xFloat = 1.0f; for (int i=0 ; i < length; i++) { // Even though division is slow, there are no intrinsic functions like there are in SSE pResult[i] = sqrt(xFloat) / xFloat; xFloat += 1.0f; } } clock_t clock4=clock(); printf("noSIMDtime:%d ms\n",1000*(clock4-clock3)/CLOCKS_PER_SEC); #endif // TIME_noSSE return 0; }
时间: 2024-03-31 07:32:37 浏览: 91
权重系数确定matlab代码-opencl-libsvm:自动从code.google.com/p/opencl-libsvm导出
代码中使用了两种方法计算Y = SQRT(x) / x,分别是SSE指令集和CPU计算。其中,SSE指令集使用了并行计算,CPU计算则是串行计算。
代码中先定义了一个长度为64000的数组pResult,用于存储计算结果。接着,使用了SSE指令集中的_mm_set1_ps函数将xDelta设置为(4,4,4,4),并将pResult转换为__m128类型的指针pResultSSE,以便使用SSE指令集进行并行计算。
代码中使用了两个宏定义,分别用于控制是否使用SSE指令集进行计算。在使用SSE指令集进行计算时,先进行了一千次循环以加大计算量,并在每次循环中使用_mm_set_ps函数将x设置为(4,3,2,1)。接着,使用_mm_sqrt_ps函数计算x中每个元素的平方根,并将结果存放在xSqrt中。根据宏定义的不同,使用_mm_div_ps或_mm_rcp_ps和_mm_mul_ps函数计算Y = SQRT(x) / x,并将结果存放在数组pResult中。
在使用CPU进行计算时,同样进行了一千次循环以加大计算量,并使用sqrt函数计算x中每个元素的平方根,再除以x并将结果存放在数组pResult中。
最后,代码输出了SSE指令集和CPU计算的耗时。
阅读全文