Towards Efficient Polynomial Multiplication for
Lattice-Based Cryptography
Chaohui Du
∗†
, Guoqiang Bai
∗‡
∗
Tsinghua National Laboratory for Information Science and Technology
†
Department of Computer Science and Technology, Tsinghua University, Beijing, China
‡
Institute of Microelectronics, Tsinghua University, Beijing, China
dch11@mails.tsinghua.edu.cn, baigq@mail.tsinghua.edu.cn
Abstract—Ring learning with errors (Ring-LWE) is the ba-
sis of various lattice based cryptosystems. The most critical
and computationally intensive operation of Ring-LWE based
cryptosystems is polynomial multiplication over rings. In this
paper, we introduce several optimization techniques to build
an efficient polynomial multiplier with the number theoretic
transform (NTT). We propose a technique to optimize the bit-
reverse operation of NTT and inverse-NTT. With additional
optimizations, our polynomial multiplier reduces the required
clock cycles from (8n+1.5n lg n) to (2n +1.5n lg n). By exploiting
the relationship of the constant factors, our polynomial multiplier
is able to reduce the number of constant factors from 4n to
2.5n, which saves about 37.5% ROM storage. In addition, we
propose a novel memory access scheme to achieve maximum
utilization of the butterfly operator. With these techniques,
our polynomial multiplier is capable to perform 57304/26913
polynomial multiplications per second for dimension 256/512 on
a Spartan-6 FPGA.
Keywords—lattice-based cryptography; learning with errors;
Ring-LWE; polynomial multiplication; hardware architecture
I. INTRODUCTION
Many widely used cryptosystems rely on the security
of number theoretical problems, such as integer factoring
and elliptic curve discrete logarithm problem. These number
theoretical problems with large security parameters can resist
attacks by classical computers. However, with Shor’s algorithm
[1], a quantum computer is able to solve these problems in
polynomial time. Hence, it is necessary to investigate post-
quantum cryptography that can resist both classical comput-
ers and quantum computers. In recent years, lattice-based
cryptography has emerged as a main candidate for post-
quantum cryptography. Its security relies on the worst-case
computational assumptions in lattices that remain hard for both
classical computers and quantum computers.
Many lattice-based cryptosystems are based on the ring
learning with errors (Ring-LWE) problem [2]. The most critical
and computationally intensive operation of these cryptosystems
is polynomial multiplication over rings. In this paper, we
introduce an efficient hardware architecture of polynomial
multiplier using the number theoretic transform [3] and the
negative wrapped convolution theorem [4]. Our polynomial
multiplier is able to perform the bit-reverse operation on-the-fly
and reduce the cost of pre-computation and post-computation.
In order to achieve maximum utilization of the butterfly
operator, a novel memory access scheme is introduced. We
also exploit the relationship of the constant factors to reduce
the ROM storage and we provide a technique to reduce around
75% ROM accesses during NTT computation. Based on these
techniques, we present an efficient polynomial multiplier. It is
able to perform 57304/26913 polynomial multiplications for
dimension 256/512 on a Spartan-6 FPGA. Our polynomial
multiplier is faster than the efficient implementation in [5] by
a factor of 1.31/1.38 for dimension 256/512. Besides, it saves
around 58.9%/61.1% slices and 20%/37.5% Block RAMs.
Compared with the implementation in [6], our implementa-
tion reduces 50% up to 67% DSPs, and 50% RAM width.
Moreover, the throughput of our polynomial multiplier is as
more than 1.46/1.53 times as high as the one in [6].
II. BACKGROUND
In this paper, lg denotes the base-2 logarithm. The dimen-
sion of the lattice is denoted n, where n is a power of 2.
Z
p
denotes the ring with the interval [0, p) ∩ Z, where p is
a prime number. Z
p
[x] represents the set of polynomials with
all coefficients in Z
p
. For a prime number p that satisfies the
relation p = 1 mod 2n, the quotient ring R
p
= Z
p
[x]/<x
n
+1>
denotes all polynomials in Z
p
[x] with degree less than n.
Let a = a
0
+ a
1
x + ... + a
n−1
x
n−1
, s = s
0
+ s
1
x + ... +
s
n−1
x
n−1
, d = d
0
+ d
1
x + ... + d
n−1
x
n−1
be elements in R
p
.
Let ω
n
be a primitive n-th root of unity in Z
p
and ψ
2
= ω
n
mod p, where ω
n
is defined as the smallest element in Z
p
that
satisfies the conditions ω
n
n
= 1 mod p and ω
j
n
6= 1 mod p for
0 < j < n. We can exploit the number theoretic transform
(NTT) [3], [7] and the negative wrapped convolution theorem
[4] to calculate d = a · s as follows:
ˆa = a {1, ψ, ..., ψ
n−1
},
ˆs = s {1, ψ, ..., ψ
n−1
},
ˆ
d = NTT
−1
ω
n
(NTT
ω
n
(ˆa) NT T
ω
n
(ˆs)),
d =
ˆ
d {1, ψ
−1
, ψ
−2
, ..., ψ
−(n−1)
},
(1)
where denotes the component-wise multiplication and the
inverse-NTT N T T
−1
ω
n
is determined by using ω
−1
n
in Algo-
rithm 1 and multiplying each coefficient of the result with n
−1
over Z
p
[4]. Algorithm 1 shows the details of iterative NTT.
The bit-reverse operation (line 1) stores the coefficient at addr
to bit-reverse(addr). The core operation of NTT is known as a
butterfly operation (line 9 – line 12). It takes two coefficients
and a constant factor to compute their corresponding new
values.
Since NTT requires to perform bit-reverse operation on
all the coefficients, it takes around n clock cycles. As NTT
978-1-4799-5341-7/16/$31.00 ©2016 IEEE
1178