CUDA优化的AES并行加速及其内存利用深度研究

需积分: 9 134 浏览量更新于2024-09-10 收藏 448KB PDF 举报

本文主要探讨了如何利用CUDA平台对Advanced Encryption Standard (AES) 加密算法进行并行化设计，以提高其在图形处理器（GPU）上的执行效率。AES算法本身虽然不是特别适合GPU并行处理，因为它的工作负载相对较低，但在当前GPU具有大量硬件线程而普通CPU核心相对较少的背景下，通过优化内存利用和数据布局，可以显著提升AES加密和解密的速度。首先，文章强调了CUDA作为一种通用GPU编程平台的价值，它允许程序员使用C语言编程，实现了类似于传统多线程编程的方式，从而实现应用程序的并行执行。然而，对于那些工作负载轻、依赖性不强的应用，如AES，传统的CUDA并行化方法可能效果有限。作者针对这个问题提出了一个特定的策略，即对AES数据进行精细的内存管理，确保在GPU的不同内存空间中分布数据，以便更好地利用GPU的并行计算能力。在设计上，文章可能会详细探讨以下几个关键点： 1. **CUDA架构理解**：介绍CUDA架构的特点，特别是流多处理器（Stream Processors）和共享内存，以及如何将AES的算术运算分解为这些组件能够处理的小任务。 2. **数据并行化**：通过数据块（Thread Blocks）和线程（Threads）的概念，解释如何将AES的循环结构映射到GPU的多核处理，使得每个线程负责处理一部分数据，从而实现并行加密或解密。 3. **内存优化**：提出如何根据AES的局部性和缓存友好特性，将数据预加载到高速缓存区域，减少内存访问延迟，提高计算性能。 4. **GPU内存模型**：讨论全局内存（Global Memory）和共享内存（Shared Memory）的使用，以及如何通过合理的内存布局减少跨线程通信，从而降低内存带宽消耗。 5. **性能评估**：通过实验对比，展示优化后的CUDA-AES算法与传统CPU实现相比，在加密和解密速度上的提升，以及在不同GPU型号上的适用性。 6. **局限性和未来研究方向**：可能提到尽管有这些改进，但AES的某些特性（如模版加密、分组大小等）可能对并行化造成挑战，同时探讨如何进一步优化算法以充分利用GPU的潜力。总结来说，这篇文章的核心内容是通过对AES算法的CUDA并行化处理，特别是内存管理和数据布局的优化，来挖掘GPU在轻量级加密任务中的潜在性能优势，以满足实际应用中的高效率需求。

CUDA-based AES Parallelization with Fine-Tuned

GPU Memory Utilization

Chonglei Mei, Hai Jiang, Jeff Jenness

Department of Computer Science

Arkansas State University

{chonglei.mei, hjiang, jeffj}@cs.astate.edu

Abstract—Current Graphics Processing Unit (GPU) presents

large potentials in speeding up computationally intensive data

parallel applications over traditional parallelization approaches

since there are much more hardware threads inside GPUs than

the computational cores available to common CPU threads.

NVIDIA developed a generic GPU programming platform,

CUDA, which allows programmers to utilize GPU through C

programming language and parallelize applications in a similar

way as in traditional multithreading approach. However, not all

applications are suitable for this new platform. Only compu-

tationally intensive applications without strong dependency are

good candidates. Although Advanced Encryption Standard (AES)

does not belong to this group due to the light workload in

its efﬁcient implementation, this paper proposed an approach

to arrange data in different GPU memory spaces properly,

overcoming the extra communication delay, and still turning

GPU into an effective accelerator. Experimental results have

demonstrated its effectiveness by performance gains and proved

that GPU can be used to accelerate more types of applications.

Keywords: GPU, CUDA, AES, parallelization, speeding up

I. INTRODUCTION

As computers and networks have reached each corner

of human life geographically and practically, security and

privacy become major concerns. Traditional data encryp-

tion/decryption is a computationally intensive task. Since it

takes up lots of resources on the computing and communica-

tion endpoints, both computation and communication activities

can be slowed down. Therefore, faster and more secure cryp-

tographic algorithms are on demand. Advanced Encryption

Standard (AES) is one of such widely used symmetric key

cryptographic algorithms. It has reduced the computational

operations dramatically in data encryption and decryption

whereas its implementation structure exhibits a high degree

of data parallelism for further performance elaboration. Theo-

retically, AES is an extra burden for security and its execution

time should be reduced. Further accelerator is expected to

exploit its rich parallelism.

At the same time, Graphics Processing Unit (GPU) is

becoming increasingly helpful to data parallel applications.

Over the last few years there has been an acceleration in

processing power found within these commodity chips which

exceeds both Moore’s predictions and recent advancement

in CPU performance [1]. The reason is that the architecture

of GPU is composed of large amount of simple processing

units. Standard Intel and AMD chips are attempting to follow

Moore’s law by increasing the number of processors available

on a single die rather than increasing the core clock speed.

Ciphertext

Plaintex

Key

AddRoundKey

Key

xpansion

W[0,3]

4,7

]

W[36,39]

W[40,43

]

……

ShiftR

ows

MixC

olumns

AddR

ound

Round 10

Round 1

SubB

ytes

ShiftR

ows

MixC

olumns

AddR

ound

Round 9

SubB

ytes

AddR

ound

ShiftR

ows

AddR

ound

Figure 1. AES Encryption

NVIDIA® CUDA™ (Compute Uniﬁed Device Architec-

ture) is a general purpose parallel computing architecture that

leverages the parallel compute engine in NVIDIA graphics

processing units (GPUs) to solve many complex computational

problems in a fraction of the time required on a CPU [2].

It includes the CUDA Instruction Set Architecture (ISA)

and the parallel compute engine inside GPU. With CUDA,

programmers can, today, use C, one of the most widely used

system programming languages, to acquire high performance

out of GPU.

CUDA is released specially for general purpose compu-

tations on GPUs. However, not all applications can achieve

speedups from the GPU hardware. Application data has to

下载后可阅读完整内容，剩余6页未读，立即下载

zhangmeswpung

粉丝: 0
资源: 1

CUDA优化的AES并行加速及其内存利用深度研究

Assign_GUP-2017.3.0-py2.py3-none-any.whl.zip

Assign_GUP-2017.4.0-py2.py3-none-any.whl.zip

Assign_GUP-2017.3.1-py2.py3-none-any.whl.zip

unity gup shader语法

python GUP环境调用yolov8

yolov8GUP利用率低的情况

opencv GUP

linux用conda重新安装cuda10.1对应的torch1.7.1GUP版本的命令

在深度学习训练模型中代码一般在什么地方设置或调用GUP

最新资源