优化多项式乘法：基于格密码的新方法

134 浏览量更新于2024-08-26 1 收藏 901KB PDF 举报

"这篇研究论文探讨了面向基于格的密码的有效多项式乘法技术，特别是针对Ring-LWE（环学习带有错误）问题。作者提出了优化数论变换（NTT）来构建高效的多项式乘法器的方法，以提高基于Ring-LWE的密码系统的性能。他们介绍的优化包括改进NTT和逆NTT的位反向操作，减少时钟周期消耗，以及优化常数因子以节省ROM存储。此外，他们还提出了一种新的内存访问方案，以最大化蝶形运算符的利用。这些技术在Spartan-6 FPGA平台上实现了高速的多项式乘法运算。" 在基于格的密码学中，Ring-LWE是一种核心问题，它为安全的公钥加密、身份验证和密钥交换提供了理论基础。然而，Ring-LWE算法的计算密集型操作，尤其是环上的多项式乘法，是其效率的关键瓶颈。论文中提出的多项式乘法优化技术旨在解决这一挑战。首先，优化的NTT和逆NTT技术对于减少计算时间至关重要。NTT是一种快速傅里叶变换的变体，常用于多项式乘法中以降低时间复杂度。通过改进位反向操作，可以更有效地进行NTT和逆NTT转换，从而降低所需时钟周期，从原来的(8n + 1.5n lg n)降低至(2n + 1.5n lg n)，显著提升了运算速度。其次，论文关注了常数因子的优化。在硬件实现中，常数因子存储在只读存储器（ROM）中，数量的减少意味着更少的存储需求。作者通过精心设计，将常数因子从4n减少到2.5n，减少了大约37.5%的ROM占用，这对于资源有限的设备，如FPGA，尤其重要。最后，为了进一步提升性能，论文提出了一种创新的内存访问策略。这涉及到如何有效地调度内存操作以最大化蝶形运算符的利用率，蝶形运算符是快速傅里叶变换中的基本构建块。这种优化可能涉及并行化、数据预取或内存层次结构的智能管理，以减少访问延迟并提高计算吞吐量。在实际应用中，这些优化技术在Spartan-6 FPGA上进行了验证，表明对于尺寸为256/512的环，系统每秒能执行57304/26913次多项式乘法，这展示了优化方法的高效性和实用性。这篇研究论文通过优化NTT、减少常数因子和改进内存访问策略，为基于Ring-LWE的密码系统提供了一种更高效、资源友好的多项式乘法实现，对提升整体系统性能具有重要意义。这些成果对于未来基于格的密码系统的设计和实现具有重要的参考价值。

Towards Efﬁcient Polynomial Multiplication for

Lattice-Based Cryptography

Chaohui Du

∗†

, Guoqiang Bai

∗‡

∗

Tsinghua National Laboratory for Information Science and Technology

†

Department of Computer Science and Technology, Tsinghua University, Beijing, China

‡

Institute of Microelectronics, Tsinghua University, Beijing, China

dch11@mails.tsinghua.edu.cn, baigq@mail.tsinghua.edu.cn

Abstract—Ring learning with errors (Ring-LWE) is the ba-

sis of various lattice based cryptosystems. The most critical

and computationally intensive operation of Ring-LWE based

cryptosystems is polynomial multiplication over rings. In this

paper, we introduce several optimization techniques to build

an efﬁcient polynomial multiplier with the number theoretic

transform (NTT). We propose a technique to optimize the bit-

reverse operation of NTT and inverse-NTT. With additional

optimizations, our polynomial multiplier reduces the required

clock cycles from (8n+1.5n lg n) to (2n +1.5n lg n). By exploiting

the relationship of the constant factors, our polynomial multiplier

is able to reduce the number of constant factors from 4n to

2.5n, which saves about 37.5% ROM storage. In addition, we

propose a novel memory access scheme to achieve maximum

utilization of the butterﬂy operator. With these techniques,

our polynomial multiplier is capable to perform 57304/26913

polynomial multiplications per second for dimension 256/512 on

a Spartan-6 FPGA.

Keywords—lattice-based cryptography; learning with errors;

Ring-LWE; polynomial multiplication; hardware architecture

I. INTRODUCTION

Many widely used cryptosystems rely on the security

of number theoretical problems, such as integer factoring

and elliptic curve discrete logarithm problem. These number

theoretical problems with large security parameters can resist

attacks by classical computers. However, with Shor’s algorithm

[1], a quantum computer is able to solve these problems in

polynomial time. Hence, it is necessary to investigate post-

quantum cryptography that can resist both classical comput-

ers and quantum computers. In recent years, lattice-based

cryptography has emerged as a main candidate for post-

quantum cryptography. Its security relies on the worst-case

computational assumptions in lattices that remain hard for both

classical computers and quantum computers.

Many lattice-based cryptosystems are based on the ring

learning with errors (Ring-LWE) problem [2]. The most critical

and computationally intensive operation of these cryptosystems

is polynomial multiplication over rings. In this paper, we

introduce an efﬁcient hardware architecture of polynomial

multiplier using the number theoretic transform [3] and the

negative wrapped convolution theorem [4]. Our polynomial

multiplier is able to perform the bit-reverse operation on-the-ﬂy

and reduce the cost of pre-computation and post-computation.

In order to achieve maximum utilization of the butterﬂy

operator, a novel memory access scheme is introduced. We

also exploit the relationship of the constant factors to reduce

the ROM storage and we provide a technique to reduce around

75% ROM accesses during NTT computation. Based on these

techniques, we present an efﬁcient polynomial multiplier. It is

able to perform 57304/26913 polynomial multiplications for

dimension 256/512 on a Spartan-6 FPGA. Our polynomial

multiplier is faster than the efﬁcient implementation in [5] by

a factor of 1.31/1.38 for dimension 256/512. Besides, it saves

around 58.9%/61.1% slices and 20%/37.5% Block RAMs.

Compared with the implementation in [6], our implementa-

tion reduces 50% up to 67% DSPs, and 50% RAM width.

Moreover, the throughput of our polynomial multiplier is as

more than 1.46/1.53 times as high as the one in [6].

II. BACKGROUND

In this paper, lg denotes the base-2 logarithm. The dimen-

sion of the lattice is denoted n, where n is a power of 2.

denotes the ring with the interval [0, p) ∩ Z, where p is

a prime number. Z

[x] represents the set of polynomials with

all coefﬁcients in Z

. For a prime number p that satisﬁes the

relation p = 1 mod 2n, the quotient ring R

= Z

[x]/<x

+1>

denotes all polynomials in Z

[x] with degree less than n.

Let a = a

+ a

x + ... + a

n−1

, s = s

+ s

x + ... +

n−1

, d = d

+ d

x + ... + d

n−1

be elements in R

Let ω

be a primitive n-th root of unity in Z

and ψ

= ω

mod p, where ω

is deﬁned as the smallest element in Z

that

satisﬁes the conditions ω

= 1 mod p and ω

6= 1 mod p for

0 < j < n. We can exploit the number theoretic transform

(NTT) [3], [7] and the negative wrapped convolution theorem

[4] to calculate d = a · s as follows:

ˆa = a  {1, ψ, ..., ψ

n−1

ˆs = s  {1, ψ, ..., ψ

n−1

d = NTT

−1

(NTT

(ˆa)  NT T

(ˆs)),

d =

d  {1, ψ

−1

, ψ

−2

, ..., ψ

−(n−1)

(1)

where  denotes the component-wise multiplication and the

inverse-NTT N T T

−1

is determined by using ω

−1

in Algo-

rithm 1 and multiplying each coefﬁcient of the result with n

−1

over Z

[4]. Algorithm 1 shows the details of iterative NTT.

The bit-reverse operation (line 1) stores the coefﬁcient at addr

to bit-reverse(addr). The core operation of NTT is known as a

butterﬂy operation (line 9 – line 12). It takes two coefﬁcients

and a constant factor to compute their corresponding new

values.

Since NTT requires to perform bit-reverse operation on

all the coefﬁcients, it takes around n clock cycles. As NTT

1178

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38693586

粉丝: 7
资源: 923

优化多项式乘法：基于格密码的新方法

基于格的密码学学习资料

简单的多项式乘法实现程序 C语言

基于Ring-LWE的公钥密码系统的有效多项式乘法器体系结构

多项式乘法

基于CONVFFT FFT的卷积和多项式乘法

多项式乘法实现

多项式乘法代码

西南交通大学数据结构实验报告--基于链表的多项式乘法

用于基于环律的公钥密码系统的高速多项式乘法器体系结构

高效环路LWE公钥密码系统多项式乘法器设计

最新资源