share a 6MB L3 cache, also known as the Last-Level
Cache, or LLC.
The unit of memory in a cache is a line which contains
a fixed number of bytes. A cache consists of multiple
cache sets each of which stores a fixed number of cache
lines. The number of cache lines in a set is the cache
associativity. Each memory line can be cached in any of
the cache lines of a single cache set. The size of cache
lines in the Core i5-3470 processor is 64 bytes. The L1
and L2 caches are 8-way associative and the L3 cache is
12-way associative.
An important feature of the LLC in modern Intel pro-
cessors is that it is an inclusive cache. That is, the LLC
contains copies of all of the data stored in the lower cache
levels. Consequently, flushing or evicting data from the
LLC also remove said data from all other cache levels of
the processor. Our attack exploits this cache behaviour.
Retrieving data from memory or from cache levels
closer to memory takes longer than retrieving it from
cache levels closer to the core. This difference in tim-
ing has been exploited for side-channel attacks. Side-
channel attacks target information that an implementa-
tion of an algorithm leaks through its interaction with
its environment. To exploit the timing difference, an at-
tacker sets the cache to a known state prior to a victim
operation. It can, then, use one of two methods to de-
duce information on the victim’s operation [43]. The
first method is measuring the time it takes for the vic-
tim to execute the operation. As this time depends on
the state of the cache when the victim starts the opera-
tion, the attacker can deduce the cache sets accessed by
the victim and, therefore, learn information on the vic-
tim [5, 9, 57]. The second approach is for the attacker to
measure the time it takes for the attacker to access data
after the victim’s operation. This time is dependent on
the cache state prior to the victim operation as well as
on the changes the victim operation caused in the cache
state [1, 2,4, 14, 19, 47, 61].
Most prior work on cache side-channel attacks relies
on the victim and spy executing within the same process-
ing core. One reason for that is that many of the attacks
suggested require the victim to be stopped while the spy
performs the attack. To that aim, the attack is combined
with an attack on the scheduler that allows the spy pro-
cess to interrupt and block the victim.
Another reason for attacking within the same core is
that the attacks focus on the L1 cache level, which is not
shared between cores. The large size of the LLC hin-
ders attacks both because setting it to a known state takes
longer than with smaller caches and because the virtual
memory used by the operating system masks the map-
ping of memory addresses to cache sets. Furthermore,
as most of the memory activity occurs at the L1 cache
level, less information can be extracted from LLC activ-
ity. Some prior works do use the LLC as an information
leak channel [46,47,60]. However, due to the cache size,
these channels have a low bandwidth.
We now proceed to describe the RSA encryption.
2.3 RSA
RSA [48] is a public-key cryptographic system that sup-
ports encryption and signing. Generating an encryption
system requires the following steps:
• Randomly selecting two prime numbers p and q and
calculating n = pq.
• Choosing a public exponent e. GnuPG uses e =
65537.
• Calculating a private exponent d ≡ e
−1
(mod (p−
1)(q −1)).
The generated encryption system consists of:
• The public key is the pair (n,e).
• The private key is the triple (p,q,d).
• The encrypting function is E(m) = m
e
mod n.
• The decrypting function is D(c) = c
d
mod n.
CRT-RSA is a common optimisation for the imple-
mentation of the decryption function. It splits the se-
cret key d into two parts d
p
= d mod (p − 1) and d
q
=
d mod (q −1), computes two parts of the message: m
p
=
c
d
p
mod p and m
q
= c
d
q
mod q. m is then computed
from m
p
and m
q
using Garner’s formula [25]:
h = (m
p
− m
q
)(q
−1
mod p) mod p
m = m
q
+ hq
To compute the encryption and decryption func-
tions, GnuPG versions before 4.1.14 and the related
libgcrypt before version 1.5.3 use the square-and-
multiply exponentiation algorithm [28]. Square-and-
multiply computes x = b
e
mod m by scanning the bits
of the binary representation of the exponent e. Given a
binary representation of e as 2
n−1
e
n−1
+· · · 2
0
e
0
, square-
and-multiply calculates a sequence of intermediate val-
ues x
n−1
,...,x
0
such that x
i
= b
be/2
i
c
mod m using the
formula x
i−1
= x
i
2
b
e
i−1
. Figure 2 shows a pseudo-code
implementation of square-and-multiply.
As can be seen from the implementation, computing
the exponent consists of sequence of Square and Mul-
tiply operations, each followed by a Modulo Reduce.
This sequence corresponds directly with the bits of the
exponent. Each occurrence of Square-Reduce-Multiply-
Reduce within the sequence corresponds to a bit whose
value is 1. Occurrences of Square-Reduce that are not
3