没有合适的资源?快使用搜索试试~ 我知道了~
首页CUBLAS_Library.pdf
资源详情
资源评论
资源推荐

CUBLAS LIBRARY
DU-06702-001_v9.0 | September 2017
User Guide

www.nvidia.com
cuBLAS Library DU-06702-001_v9.0|2

www.nvidia.com
cuBLAS Library DU-06702-001_v9.0|1
Chapter1.
INTRODUCTION
The cuBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms)
on top of the NVIDIA
®
CUDA
™
runtime. It allows the user to access the computational
resources of NVIDIA Graphics Processing Unit (GPU).
Starting with CUDA 6.0, the cuBLAS Library now exposes two sets of API, the regular
cuBLAS API which is simply called cuBLAS API in this document and the CUBLASXT
API.
To use the cuBLAS API, the application must allocate the required matrices and vectors
in the GPU memory space, fill them with data, call the sequence of desired cuBLAS
functions, and then upload the results from the GPU memory space back to the host.
The cuBLAS API also provides helper functions for writing and retrieving data from the
GPU.
To use the CUBLASXT API, the application must keep the data on the Host and the
Library will take care of dispatching the operation to one or multiple GPUS present in
the system, depending on the user request.
1.1.Data layout
For maximum compatibility with existing Fortran environments, the cuBLAS library
uses column-major storage, and 1-based indexing. Since C and C++ use row-major
storage, applications written in these languages can not use the native array semantics
for two-dimensional arrays. Instead, macros or inline functions should be defined to
implement matrices on top of one-dimensional arrays. For Fortran code ported to C
in mechanical fashion, one may chose to retain 1-based indexing to avoid the need to
transform loops. In this case, the array index of a matrix element in row “i” and column
“j” can be computed via the following macro
#define IDX2F(i,j,ld) ((((j)-1)*(ld))+((i)-1))
Here, ld refers to the leading dimension of the matrix, which in the case of column-major
storage is the number of rows of the allocated matrix (even if only a submatrix of it is
being used). For natively written C and C++ code, one would most likely choose 0-based

Introduction
www.nvidia.com
cuBLAS Library DU-06702-001_v9.0|2
indexing, in which case the array index of a matrix element in row “i” and column “j”
can be computed via the following macro
#define IDX2C(i,j,ld) (((j)*(ld))+(i))
1.2.New and Legacy cuBLAS API
Starting with version 4.0, the cuBLAS Library provides a new updated API, in addition
to the existing legacy API. This section discusses why a new API is provided, the
advantages of using it, and the differences with the existing legacy API.
The new cuBLAS library API can be used by including the header file “cublas_v2.h”. It
has the following features that the legacy cuBLAS API does not have:
‣
the handle to the cuBLAS library context is initialized using the function and is
explicitly passed to every subsequent library function call. This allows the user to
have more control over the library setup when using multiple host threads and
multiple GPUs. This also allows the cuBLAS APIs to be reentrant.
‣
the scalars and can be passed by reference on the host or the device, instead of
only being allowed to be passed by value on the host. This change allows library
functions to execute asynchronously using streams even when and are generated
by a previous kernel.
‣
when a library routine returns a scalar result, it can be returned by reference on
the host or the device, instead of only being allowed to be returned by value only
on the host. This change allows library routines to be called asynchronously when
the scalar result is generated and returned by reference on the device resulting in
maximum parallelism.
‣
the error status cublasStatus_t is returned by all cuBLAS library function calls.
This change facilitates debugging and simplifies software development. Note that
cublasStatus was renamed cublasStatus_t to be more consistent with other
types in the cuBLAS library.
‣
the cublasAlloc() and cublasFree() functions have been deprecated.
This change removes these unnecessary wrappers around cudaMalloc() and
cudaFree(), respectively.
‣
the function cublasSetKernelStream() was renamed cublasSetStream() to be
more consistent with the other CUDA libraries.
The legacy cuBLAS API, explained in more detail in the Appendix A, can be used by
including the header file “cublas.h”. Since the legacy API is identical to the previously
released cuBLAS library API, existing applications will work out of the box and
automatically use this legacy API without any source code changes. In general, new
applications should not use the legacy cuBLAS API, and existing existing applications
should convert to using the new API if it requires sophisticated and optimal stream
parallelism or if it calls cuBLAS routines concurrently from multiple threads. For the rest
of the document, the new cuBLAS Library API will simply be referred to as the cuBLAS
Library API.
As mentioned earlier the interfaces to the legacy and the cuBLAS library APIs are the
header file “cublas.h” and “cublas_v2.h”, respectively. In addition, applications using
the cuBLAS library need to link against the DSO cublas.so (Linux), the DLL cublas.dll

Introduction
www.nvidia.com
cuBLAS Library DU-06702-001_v9.0|3
(Windows), or the dynamic library cublas.dylib (Mac OS X). Note: the same dynamic
library implements both the new and legacy cuBLAS APIs.
1.3.Example code
For sample code references please see the two examples below. They show an
application written in C using the cuBLAS library API with two indexing styles
剩余180页未读,继续阅读

















安全验证
文档复制为VIP权益,开通VIP直接复制

评论0