CUDA CUBLAS库：加速GPU计算的BLAS接口指南

CUDA

需积分: 24 99 浏览量更新于2024-07-18 收藏 2.75MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

资源详情

资源推荐

Using the cuBLAS API

www.nvidia.com

cuBLAS Library DU-06702-001_v9.2|14

2.3.CUDA Datatypes Reference

The chapter describes types shared by multiple CUDA Libraries and defined in the

header file library_types.h.

2.3.1.cudaDataType_t

The cudaDataType_t type is an enumerant to specify the data precision. It is used

when the data reference does not carry the type itself (e.g void *)

For example, it is used in the routine cublasSgemmEx.

Value Meaning

CUDA_R_16F the data type is 16-bit floating-point

CUDA_C_16F the data type is 16-bit complex floating-point

CUDA_R_32F the data type is 32-bit floating-point

CUDA_C_32F the data type is 32-bit complex floating-point

CUDA_R_64F the data type is 64-bit floating-point

CUDA_C_64F the data type is 64-bit complex floating-point

CUDA_R_8I the data type is 8-bit signed integer

CUDA_C_8I the data type is 8-bit complex signed integer

CUDA_R_8U the data type is 8-bit unsigned integer

CUDA_C_8U the data type is 8-bit complex unsigned integer

2.3.2.libraryPropertyType_t

The libraryPropertyType_t is used as a parameter to specify which property is

requested when using the routine cublasGetProperty

Value Meaning

MAJOR_VERSION enumerant to query the major version

MINOR_VERSION enumerant to query the minor version

PATCH_LEVEL number to identify the patch level

2.4.cuBLAS Helper Function Reference

2.4.1.cublasCreate()

cublasStatus_t

cublasCreate(cublasHandle_t *handle)

Using the cuBLAS API

www.nvidia.com

cuBLAS Library DU-06702-001_v9.2|15

This function initializes the CUBLAS library and creates a handle to an opaque structure

holding the CUBLAS library context. It allocates hardware resources on the host and

device and must be called prior to making any other CUBLAS library calls. The CUBLAS

library context is tied to the current CUDA device. To use the library on multiple

devices, one CUBLAS handle needs to be created for each device. Furthermore, for a

given device, multiple CUBLAS handles with different configuration can be created.

Because cublasCreate allocates some internal resources and the release of those

resources by calling cublasDestroy will implicitly call cublasDeviceSynchronize,

it is recommended to minimize the number of cublasCreate/cublasDestroy

occurences. For multi-threaded applications that use the same device from different

threads, the recommended programming model is to create one CUBLAS handle per

thread and use that CUBLAS handle for the entire life of the thread.

Return Value Meaning

CUBLAS_STATUS_SUCCESS the initialization succeeded

CUBLAS_STATUS_NOT_INITIALIZED the CUDA

™

Runtime initialization failed

CUBLAS_STATUS_ALLOC_FAILED the resources could not be allocated

2.4.2.cublasDestroy()

cublasStatus_t

cublasDestroy(cublasHandle_t handle)

This function releases hardware resources used by the CUBLAS library. This function

is usually the last call with a particular handle to the CUBLAS library. Because

cublasCreate allocates some internal resources and the release of those resources

by calling cublasDestroy will implicitly call cublasDeviceSynchronize, it is

recommended to minimize the number of cublasCreate/cublasDestroy occurences.

Return Value Meaning

CUBLAS_STATUS_SUCCESS the shut down succeeded

CUBLAS_STATUS_NOT_INITIALIZED the library was not initialized

2.4.3.cublasGetVersion()

cublasStatus_t

cublasGetVersion(cublasHandle_t handle, int *version)

This function returns the version number of the cuBLAS library.

Return Value Meaning

CUBLAS_STATUS_SUCCESS the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED the library was not initialized

2.4.4.cublasGetProperty()

cublasStatus_t

cublasGetProperty(libraryPropertyType type, int *value)

Using the cuBLAS API

www.nvidia.com

cuBLAS Library DU-06702-001_v9.2|17

2.4.8.cublasSetPointerMode()

cublasStatus_t

cublasSetPointerMode(cublasHandle_t handle, cublasPointerMode_t mode)

This function sets the pointer mode used by the cuBLAS library. The default is

for the values to be passed by reference on the host. Please see the section on the

cublasPointerMode_t type for more details.

Return Value Meaning

CUBLAS_STATUS_SUCCESS the pointer mode was set successfully

CUBLAS_STATUS_NOT_INITIALIZED the library was not initialized

2.4.9.cublasSetVector()

cublasStatus_t

cublasSetVector(int n, int elemSize,

const void *x, int incx, void *y, int incy)

This function copies n elements from a vector x in host memory space to a vector y in

GPU memory space. Elements in both vectors are assumed to have a size of elemSize

bytes. The storage spacing between consecutive elements is given by incx for the source

vector x and by incy for the destination vector y.

In general, y points to an object, or part of an object, that was allocated via

cublasAlloc(). Since column-major format for two-dimensional matrices is assumed,

if a vector is part of a matrix, a vector increment equal to 1 accesses a (partial) column of

that matrix. Similarly, using an increment equal to the leading dimension of the matrix

results in accesses to a (partial) row of that matrix.

Return Value Meaning

CUBLAS_STATUS_SUCCESS the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED the library was not initialized

CUBLAS_STATUS_INVALID_VALUE the parameters incx, incy, elemSize<=0

CUBLAS_STATUS_MAPPING_ERROR there was an error accessing GPU memory

2.4.10.cublasGetVector()

cublasStatus_t

cublasGetVector(int n, int elemSize,

const void *x, int incx, void *y, int incy)

This function copies n elements from a vector x in GPU memory space to a vector y in

host memory space. Elements in both vectors are assumed to have a size of elemSize

bytes. The storage spacing between consecutive elements is given by incx for the source

vector and incy for the destination vector y.

In general, x points to an object, or part of an object, that was allocated via

cublasAlloc(). Since column-major format for two-dimensional matrices is assumed,

if a vector is part of a matrix, a vector increment equal to 1 accesses a (partial) column of

Using the cuBLAS API

www.nvidia.com

cuBLAS Library DU-06702-001_v9.2|18

that matrix. Similarly, using an increment equal to the leading dimension of the matrix

results in accesses to a (partial) row of that matrix.

Return Value Meaning

CUBLAS_STATUS_SUCCESS the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED the library was not initialized

CUBLAS_STATUS_INVALID_VALUE the parameters incx, incy, elemSize<=0

CUBLAS_STATUS_MAPPING_ERROR there was an error accessing GPU memory

2.4.11.cublasSetMatrix()

cublasStatus_t

cublasSetMatrix(int rows, int cols, int elemSize,

const void *A, int lda, void *B, int ldb)

This function copies a tile of rows x cols elements from a matrix A in host memory

space to a matrix B in GPU memory space. It is assumed that each element requires

storage of elemSize bytes and that both matrices are stored in column-major format,

with the leading dimension of the source matrix A and destination matrix B given in

lda and ldb, respectively. The leading dimension indicates the number of rows of the

allocated matrix, even if only a submatrix of it is being used. In general, B is a device

pointer that points to an object, or part of an object, that was allocated in GPU memory

space via cublasAlloc().

Return Value Meaning

CUBLAS_STATUS_SUCCESS the operation completed successfully

CUBLAS_STATUS_NOT_INITIALIZED the library was not initialized

CUBLAS_STATUS_INVALID_VALUE the parameters rows, cols<0 or elemSize,

lda, ldb<=0

CUBLAS_STATUS_MAPPING_ERROR there was an error accessing GPU memory

2.4.12.cublasGetMatrix()

cublasStatus_t

cublasGetMatrix(int rows, int cols, int elemSize,

const void *A, int lda, void *B, int ldb)

This function copies a tile of rows x cols elements from a matrix A in GPU memory

space to a matrix B in host memory space. It is assumed that each element requires

storage of elemSize bytes and that both matrices are stored in column-major format,

with the leading dimension of the source matrix A and destination matrix B given in

lda and ldb, respectively. The leading dimension indicates the number of rows of the

allocated matrix, even if only a submatrix of it is being used. In general, A is a device

pointer that points to an object, or part of an object, that was allocated in GPU memory

space via cublasAlloc().

剩余188页未读，继续阅读

「已注销」

粉丝: 0
资源: 2

CUDA CUBLAS库：加速GPU计算的BLAS接口指南

CUBLAS_Library

CUBLAS_Library.pdf

cublas64_90.dll cudart64_90.dll cudnn64_7.dll curand64_100.dll

for this operation the cublas library needs to be available (see installatio

CUBLAS_STATUS_NOT_SUPPORTED

/usr/bin/ld: 找不到 -lCUDA::cublas

CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

OSError: /home/zhangguiwei/anaconda3/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11

cublas_status_not_initialized

runtimeerror: cuda error: cublas_status_invalid_value when calling `cublasge

CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

val RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

如何将这个操作转移到gpu上运行呢

CUSPARSE安装部署

如何利用conda在Windows11系统中安装libcublas-dev安装包

Error loading "E:\Anaconda_install_Dic\envs\python38\lib\site-packages\torch\lib\c10_cuda.dll" or one of its dependencies

cuda下好后怎么安装

cuda c学习指南

ubuntu安装支持cuda的opencv

最新资源