1
CUDA-based AES Parallelization with Fine-Tuned
GPU Memory Utilization
Chonglei Mei, Hai Jiang, Jeff Jenness
Department of Computer Science
Arkansas State University
{chonglei.mei, hjiang, jeffj}@cs.astate.edu
Abstract—Current Graphics Processing Unit (GPU) presents
large potentials in speeding up computationally intensive data
parallel applications over traditional parallelization approaches
since there are much more hardware threads inside GPUs than
the computational cores available to common CPU threads.
NVIDIA developed a generic GPU programming platform,
CUDA, which allows programmers to utilize GPU through C
programming language and parallelize applications in a similar
way as in traditional multithreading approach. However, not all
applications are suitable for this new platform. Only compu-
tationally intensive applications without strong dependency are
good candidates. Although Advanced Encryption Standard (AES)
does not belong to this group due to the light workload in
its efficient implementation, this paper proposed an approach
to arrange data in different GPU memory spaces properly,
overcoming the extra communication delay, and still turning
GPU into an effective accelerator. Experimental results have
demonstrated its effectiveness by performance gains and proved
that GPU can be used to accelerate more types of applications.
Keywords: GPU, CUDA, AES, parallelization, speeding up
I. INTRODUCTION
As computers and networks have reached each corner
of human life geographically and practically, security and
privacy become major concerns. Traditional data encryp-
tion/decryption is a computationally intensive task. Since it
takes up lots of resources on the computing and communica-
tion endpoints, both computation and communication activities
can be slowed down. Therefore, faster and more secure cryp-
tographic algorithms are on demand. Advanced Encryption
Standard (AES) is one of such widely used symmetric key
cryptographic algorithms. It has reduced the computational
operations dramatically in data encryption and decryption
whereas its implementation structure exhibits a high degree
of data parallelism for further performance elaboration. Theo-
retically, AES is an extra burden for security and its execution
time should be reduced. Further accelerator is expected to
exploit its rich parallelism.
At the same time, Graphics Processing Unit (GPU) is
becoming increasingly helpful to data parallel applications.
Over the last few years there has been an acceleration in
processing power found within these commodity chips which
exceeds both Moore’s predictions and recent advancement
in CPU performance [1]. The reason is that the architecture
of GPU is composed of large amount of simple processing
units. Standard Intel and AMD chips are attempting to follow
Moore’s law by increasing the number of processors available
on a single die rather than increasing the core clock speed.
Figure 1. AES Encryption
NVIDIA® CUDA™ (Compute Unified Device Architec-
ture) is a general purpose parallel computing architecture that
leverages the parallel compute engine in NVIDIA graphics
processing units (GPUs) to solve many complex computational
problems in a fraction of the time required on a CPU [2].
It includes the CUDA Instruction Set Architecture (ISA)
and the parallel compute engine inside GPU. With CUDA,
programmers can, today, use C, one of the most widely used
system programming languages, to acquire high performance
out of GPU.
CUDA is released specially for general purpose compu-
tations on GPUs. However, not all applications can achieve
speedups from the GPU hardware. Application data has to
978-1-4244-6534-7/10/$26.00 ©2010 IEEE