eax”, “jmp [eax]”, and “ret”. Indirect branches are also
supported on ARM (e.g., “MOV pc, r14”), MIPS (e.g., “jr
$ra”), RISC-V (e.g., “jalr x0,x1,0”), and other proces-
sors. To compensate for the additional flexibility as compared
to direct branches, indirect jumps and calls are optimized using
at least two different prediction mechanisms [35].
Intel [35] describes that the processor predicts
• “Direct Calls and Jumps” in a static or monotonic manner,
• “Indirect Calls and Jumps” either in a monotonic manner,
or in a varying manner, which depends on recent program
behavior, and for
• “Conditional Branches” the branch target and whether the
branch will be taken.
Consequently, several processor components are used for
predicting the outcome of branches. The Branch Target Buffer
(BTB) keeps a mapping from addresses of recently executed
branch instructions to destination addresses [44]. Processors
can use the BTB to predict future code addresses even before
decoding the branch instructions. Evtyushkin et al. [14] ana-
lyzed the BTB of an Intel Haswell processor and concluded
that only the 31 least significant bits of the branch address are
used to index the BTB.
For conditional branches, recording the target address is not
necessary for predicting the outcome of the branch since the
destination is typically encoded in the instruction while the
condition is determined at runtime. To improve predictions,
the processor maintains a record of branch outcomes, both
for recent direct and indirect branches. Bhattacharya et al. [9]
analyzed the structure of branch history prediction in recent
Intel processors.
Although return instructions are a type of indirect branch,
a separate mechanism for predicting the destination address is
often used in modern CPUs. The Return Stack Buffer (RSB)
maintains a copy of the most recently used portion of the
call stack [15]. If no data is available in the RSB, different
processors will either stall the execution or use the BTB as a
fallback [15].
Branch-prediction logic, e.g., BTB and RSB, is typically not
shared across physical cores [19]. Hence, the processor learns
only from previous branches executed on the same core.
D. The Memory Hierarchy
To bridge the speed gap between the faster processor and
the slower memory, processors use a hierarchy of successively
smaller but faster caches. The caches divide the memory into
fixed-size chunks called lines, with typical line sizes being 64
or 128 bytes. When the processor needs data from memory,
it first checks if the L1 cache, at the top of the hierarchy,
contains a copy. In the case of a cache hit, i.e., the data is
found in the cache, the data is retrieved from the L1 cache and
used. Otherwise, in the case of a cache miss, the procedure is
repeated to attempt to retrieve the data from the next cache
levels, and finally external memory. Once a read is completed,
the data is typically stored in the cache (and a previously
cached value is evicted to make room) in case it is needed
again in the near future. Modern Intel processors typically
have three cache levels, with each core having dedicated L1
and L2 caches and all cores sharing a common L3 cache, also
known as the Last-Level Cache (LLC).
A processor must ensure that the per-core L1 and L2 caches
are coherent using a cache coherence protocol, often based
on the MESI protocol [35]. In particular, the use of the MESI
protocol or some of its variants implies that a memory write
operation on one core will cause copies of the same data
in the L1 and L2 caches of other cores to be marked as
invalid, meaning that future accesses to this data on other
cores will not be able to quickly load the data from the L1
or L2 cache [53, 68]. When this happens repeatedly to a
specific memory location, this is informally called cache-line
bouncing. Because memory is cached with a line granularity,
this can happen even if two cores access different nearby
memory locations that map to the same cache line. This
behavior is called false sharing and is well-known as a source
of performance issues [33]. These properties of the cache
coherency protocol can sometimes be abused as a replacement
for cache eviction using the clflush instruction or eviction
patterns [27]. This behavior was previously explored as a
potential mechanism to facilitate Rowhammer attacks [16].
E. Microarchitectural Side-Channel Attacks
All of the microarchitectural components we discussed
above improve the processor performance by predicting fu-
ture program behavior. To that aim, they maintain state that
depends on past program behavior and assume that future
behavior is similar to or related to past behavior.
When multiple programs execute on the same hardware,
either concurrently or via time sharing, changes in the microar-
chitectural state caused by the behavior of one program may
affect other programs. This, in turn, may result in unintended
information leaks from one program to another [19].
Initial microarchitectural side channel attacks exploited tim-
ing variability [43] and leakage through the L1 data cache
to extract keys from cryptographic primitives [52, 55, 69].
Over the years, channels have been demonstrated over mul-
tiple microarchitectural components, including the instruc-
tion cache [3], lower level caches [30, 38, 48, 74], the
BTB [14, 44], and branch history [1, 2]. The targets of at-
tacks have broadened to encompass co-location detection [59],
breaking ASLR [14, 26, 72], keystroke monitoring [25], web-
site fingerprinting [51], and genome processing [10]. Recent
results include cross-core and cross-CPU attacks [37, 75],
cloud-based attacks [32, 76], attacks on and from trusted
execution environments [10, 44, 61], attacks from mobile
code [23, 46, 51], and new attack techniques [11, 28, 44].
In this work, we use the Flush+Reload technique [30, 74],
and its variant Evict+Reload [25], for leaking sensitive infor-
mation. Using these techniques, the attacker begins by evicting
a cache line from the cache that is shared with the victim. After
the victim executes for a while, the attacker measures the time
it takes to perform a memory read at the address corresponding
to the evicted cache line. If the victim accessed the monitored
cache line, the data will be in the cache, and the access will