![](https://csdnimg.cn/release/download_crawler_static/88846040/bg11.jpg)
Using the cuFFT API
cuFFT Library User's Guide DU-06707-001_v11.4|11
the same switch. Note that multiple GPU execution is not guaranteed to solve a given size
problem in a shorter time than single GPU execution.
The multiple GPU extensions to cuFFT are built on the extensible cuFFT API. The general
steps in defining and executing a transform with this API are:
‣
cufftCreate() - create an empty plan, as in the single GPU case
‣
cufftXtSetGPUs() - define which GPUs are to be used
‣
Optional: cufftEstimate{1d,2d,3d,Many}() - estimate the sizes of the work areas
required. These are the same functions used in the single GPU case although the definition
of the argument workSize reflects the number of GPUs used.
‣
cufftMakePlan{1d,2d,3d,Many}() - create the plan. These are the same functions
used in the single GPU case although the definition of the argument workSize reflects the
number of GPUs used.
‣
Optional: cufftGetSize{1d,2d,3d,Many}() - refined estimate of the sizes of the work
areas required. These are the same functions used in the single GPU case although the
definition of the argument workSize reflects the number of GPUs used.
‣
Optional: cufftGetSize() - check workspace size. This is the same function used in the
single GPU case although the definition of the argument workSize reflects the number of
GPUs used.
‣
Optional: cufftXtSetWorkArea() - do your own workspace allocation.
‣
cufftXtMalloc() - allocate descriptor and data on the GPUs
‣
cufftXtMemcpy() - copy data to the GPUs
‣
cufftXtExecDescriptorC2C()/cufftXtExecDescriptorZ2Z() - execute the plan
‣
cufftXtMemcpy() - copy data from the GPUs
‣
cufftXtFree() - free any memory allocated with cufftXtMalloc()
‣
cufftDestroy() - free cuFFT plan resources
2.8.1. Plan Specification and Work Areas
In the single GPU case a plan is created by a call to cufftCreate() followed by a call to
cufftMakePlan*(). For multiple GPUs, the GPUs to use for execution are identified by a call
to cufftXtSetGPUs() and this must occur after the call to cufftCreate() and prior to the
call to cufftMakePlan*().
Note that when cufftMakePlan*() is called for a single GPU, the work area is on that GPU. In
a multiple GPU plan, the returned work area has multiple entries; one value per GPU. That is
workSize points to a size_t array, one entry per GPU. Also the strides and batches apply to
the entire plan across all GPUs associated with the plan.
Once a plan is locked by a call to cufftMakePlan*(), different descriptors may be specified in
calls to cufftXtExecDescriptor*() to execute the plan on different data sets, but the new
descriptors must use the same GPUs in the same order.
As in the single GPU case, cufftEstimateSize{Many,1d,2d,3d}() and
cufftGetSize{Many,1d,2d,3d}() give estimates of the work area sizes required for a
multiple GPU plan and in this case workSize points to a size_t array, one entry per GPU.