FPGA上优化 Kyber算法：多项式向量处理器的高效设计

145 浏览量更新于2024-08-26 1 收藏 226KB PDF 举报

本文主要探讨了在FPGA（Field-Programmable Gate Array）平台上实现高效 Kyber 调制器的设计与优化策略。Kyber 是一种被广泛视为后量子时代密码学标准候选的公钥加密算法，特别适用于安全性要求高且需要处理大量数据的场景。首先，作者针对 Kyber 算法的特点，提出了一种针对性的优化策略。该策略着重于合并多项式运算，旨在减少硬件资源的消耗和提升计算速度。在具体实施过程中，通过FPGA的并行性和定制化特性，设计了一种专门针对多项式向量的处理器，能够有效利用硬件的性能优势。在实验对比中，与教科书级别的实现相比，该处理器在 Kyber512 和 Kyber1024 的执行效率上分别实现了29.4%和33.3%的时钟周期削减。这表明，通过在FPGA上进行优化设计，可以显著提高 Kyber 加密算法的实时性能，这对于大规模数据加密和安全通信系统具有重要意义。 FPGA的优势在于其灵活性和可编程性，使得研究人员可以根据算法特性和实际应用需求对硬件结构进行定制化设计。此外，通过硬件加速，可以避开软件中可能遇到的性能瓶颈，特别是在嵌入式和物联网设备等资源受限的环境中，这种优化更是不可或缺。然而，实现这样的处理器并非易事，它涉及到多项技术挑战，如高性能逻辑布线、内存管理、以及与软件交互接口的设计。文章可能还会深入讨论如何优化逻辑资源分配，如何处理高阶多项式运算，以及如何确保在有限的面积和功耗预算下实现最大性能提升。总结来说，这篇研究论文为 Kyber 在FPGA上的高效实现提供了新的思路和方法，对于密码学领域的硬件加速和后量子安全的应用具有重要的实践价值。通过这种方式，未来的信息安全系统可能会受益于更快、更节能的加密处理能力。

Towards Efﬁcient Kyber on FPGAs: A Processor for Vector of Polynomials

Zhaohui Chen

∗

Yuan Ma

†

Tianyu Chen

School of Computer Science State Key Laboratory of State Key Laboratory of

and Technology Information Security Information Security

University of Chinese Institute of Information Institute of Information

Academy of Sciences Engineering, CAS Engineering, CAS

Beijing, China 100049 Beijing, China 100095 Beijing, China 100095

chenzhaohui17@mails.ucas.ac.cn mayuan@iie.ac.cn chentianyu@iie.ac.cn

Jingqiang Lin Jiwu Jing

State Key Laboratory of Information Security School of Computer Science and Technology

Institute of Information Engineering, CAS University of Chinese Academy of Sciences

Beijing, China 100095 Beijing, China 100049

linjingqiang@iie.ac.cn jwjing@ucas.ac.cn

Abstract—Kyber is a promising candidate in post-quantum

cryptography standardization process. In this paper, we propose

a targeted optimization strategy and implement a processor for

Kyber on FPGAs. By merging the operations, we cut off 29.4%

clock cycles for Kyber512 and 33.3% for Kyber1024 compared

with the textbook implementations. We utilize Gentlemen-Sande

(GS) butterﬂy to optimize the Number-Theoretic Transform

(NTT) implementation. The bottleneck of memory access is

broken taking advantage of a dual-column sequential scheme.

We further propose a pipeline architecture for better perfor-

mance. The optimizations help the processor achieve 31684 NTT

operations per second using only 477 LUTs, 237 FFs and 1 DSP.

Our strategy is at least 3x more efﬁcient than the state-of-the-art

module for NTT with a similar security level.

I. INTRODUCTION

Public key cryptography based on large integer factoring

and discrete logarithm problem is widely used in digital sig-

nature, electronic authentication and TLS/SSL key exchange,

etc. Quantum computers would completely break these cryp-

tosystems with Shor’s algorithm [1]. To seek for appropriate

substitutes, the National Institute of Standards and Technology

(NIST) called for post-quantum public-key encryption, key

encapsulation mechanism and digital signature schemes in

2017. Interest in lattice-based cryptography has increased due

to the quantum-resistant properties and the potential for high-

speed implementation with relatively small key and ciphertext

size [2], [3].

Regev [4], [5] introduced Learning With Errors (LWE)

problem supported by a theoretical proof of security. However,

a large parameter matrix A limits its efﬁciency. Lyuba-

shevsky [6] et al. proposed Ring-Learning With Errors (Ring-

LWE) over polynomial ring Z

[X]/ (X

+1) to avoid the

large parameter. Although Ring-LWE is more practical than

the standard LWE, its algebraic structure might enable threat-

ening attacks [7]. Module-Learning With Errors (Module-

∗

Also with State Key Laboratory of Information Security, Institute of

Information Engineering, CAS, Beijing, China 100095.

†

This author is the corresponding author.

LWE) hardness assumption proposed in [8] provides a trade-

off between security and efﬁciency with a scalable vec-

tor of polynomials (polyvec) structure. As a competitive

instance, Kyber [9] algorithm ﬁxes the polynomial ring

as Z

7681

[X]/



256



. Thus, as the most computation-

ally intensive operation, multiplication over k-dimensional

polyvec can be optimized with a linear algorithm named NTT,

which can reduce computational complexity from O(n

) to

O(n log n) [10], [11].

There has been increased interest in implementing Ring-

LWE on FPGAs due to the potential towards high-performance

and compact application scenarios [12]–[17]. A relatively new

Ring-LWE-based scheme, known as NewHope [18], together

with its variant NewHope − Simple [19] were implemented

on Artix7 FPGAs with targeted optimization in [16], [17].

However, we have not found any detailed optimization for

Module-LWE-based schemes on FPGAs so far. Among the

existing works, the high-efﬁciency implementations like [14]

initialize several processing elements in parallel thus more

arithmetic and memory instances are required. On the other

hand, the NTT algorithm and memory access takes a lot of

clock cycles for compact processors like [12], [13]. Thus,

implementing a both time and area efﬁcient processor is still

a hard work.

In this paper, we design and implement an FPGA-based

processor for operations over polyvec with a good trade-off

between area and performance. The efﬁciency of lattice-based

schemes makes a signiﬁcant improvement compared with the

previous hardware implementations. Our contributions are as

follows:

1) We optimize the NTT algorithm with GS butterﬂy. The

GS butterﬂy is used both in forward and inverse NTT in

order to utilize the internal DSP adders. The optimization

reduces a total of 29.4% clock cycles for Kyber512

and 33.3% for Kyber1024 compared with the textbook

implementations.

2) We develop dual-column sequential storage and bit-

reversed address accessing. These techniques keep the

datapath free of bubble and avoid redundant latency

978-1-7281-4123-7/20/$31.00

 2020 IEEE

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38677936

粉丝: 3
资源: 954

FPGA上优化 Kyber算法：多项式向量处理器的高效设计

后量子密码CRYSTALS-Kyber的FPGA多路并行优化实现.docx

第三轮后量子密码算法 Kyber

基于R-LWE的公钥加密方案

kyber:我的Go Kyber实施镜像

kyber:Go语言的高级密码库

kyber:旧的且不支持的BigchainDB示例，教程和刻录实验

kyber-k2so:实施Kyber（版本2）后量子IND-CCA2 KEM

kyber_utils_sc:Kyber的实用程序智能合约代码。 可以在所有智能联系人存储库中使用

crystals-kyber-[removed]CRYSTALS-KYBER（版本3）后量子密钥交换算法JavaScript实现

kyber-tracker

最新资源

kyber_utils_sc:Kyber的实用程序智能合约代码。可以在所有智能联系人存储库中使用