Figure 1: Shuffler architecture. We use symbols and re-
locations (0) for augmented binary analysis (1), rewrite
code into shufflable form (2), and asynchronously create
new code copies at runtime (3), while self-hosting (4).
3.1 Goals
The main goals of Shuffler are:
• Deployability: We aim to reduce the burden on
end-users as much as possible. Thus, we require no
direct access to source code, no static binary rewrit-
ing on disk, and no modifications to system compo-
nents (except our small loader patch).
• Security: Our goal is to defeat all known code reuse
attacks, without expanding the trusted computing
base. We constrain the lifetime of leaked informa-
tion by providing a configurable shuffling period,
mitigating code reuse and JIT-ROP attacks.
• Performance: Because time is an integral part of
our security model, speed is of the essence. We aim
to provide low runtime overhead, and also low total
shuffling latency to allow for high shuffling rates.
3.2 Architecture
Shuffler is designed to require minimal system modi-
fications. To avoid kernel changes, it runs entirely in
userspace; to avoid requiring source or a modified com-
piler, it operates on program binaries. Performing re-
randomization soundly requires complete and precise
pointer analysis. Rather than attempting arbitrary binary
analysis, we leverage symbol and relocation information
from the (unmodified) compiler and linker. Options to
preserve this information exist in every major compiler.
Thus, we are able to achieve completely accurate disas-
sembly in what we call augmented binary analysis—as
shown in Figure 1 part (1) and detailed in Section 3.3.
At load-time, Shuffler transforms the program’s code
using binary rewriting (Figure 1 part (2)). The goal of
rewriting is to be able to track and update all code point-
ers at runtime. We avoid the taint tracking used by related
work [7,42] because it is expensive and would introduce
races during asynchronous pointer updates. Instead, we
leverage our complete and accurate disassembly to trans-
form all code pointers into unique identifiers—indices
into a code pointer table. These indices cannot be altered
after load time (the potential security implications of this
choice are discussed in Section 6), but they trade off very
favorably against performance and ease of implementa-
tion. We handle return addresses (dynamically generated
code pointers) differently, encrypting them on the stack
rather than using indices, thereby preventing disclosure
while maintaining good performance.
Our system performs re-randomization at the level of
functions within a specific shuffle period, a randomiza-
tion deadline specified in milliseconds. Shuffler runs in a
separate thread and prepares a new shuffled copy of code
within this deadline, as shown in Figure 1 part (3). This
step is accelerated using a Fenwick tree (see Section 4.4).
The vast majority of the re-randomization process is per-
formed asynchronously: creating new copies of code,
fixing up instruction displacements, updating pointers in
the code table, etc. The threads are globally paused only
to atomically update return addresses. Since any existing
return addresses reference the old copy of code, we must
revisit saved stack frames and update them. Each thread
walks its own stack in parallel, following base point-
ers backwards to iterate through stack frames (a process
known as stack unwinding); see Section 3.3 for details.
Shuffler runs in an egalitarian manner, at the same
level of privilege as target programs, and within the same
address space. To prevent our own code from being used
in a code reuse attack, Shuffler randomizes it the same
way it does all other code (Figure 1 part (4)). In fact,
our scheme uses binary rewriting to transform all code
in a userspace application (the program, Shuffler, and all
shared libraries) into a single code sandbox, essentially
turning it into a statically linked application at runtime.
Bootstrapping from original code into this self-hosting
environment is challenging, particularly without substan-
tially changing the system loader.
3.3 Challenges
Changing function pointer behaviour Normal binary
code is generated under the assumption that the pro-
gram’s memory layout remains consistent and function
pointers have indefinite lifetime. Re-randomization in-
troduces an arbitrary lifetime for each block of code,
and so re-randomization becomes an exercise in avoid-
ing dangling code pointers. Failing to update even one
such pointer may cause the program to crash, or worse,
fall victim to a use-after-free attack.
Hence, we need to accurately track and update every
code pointer during the re-randomization process. We
opt to statically transform all code pointers into unique
identifiers—namely, indices into a hidden code pointer
table. Relying on accurate and complete disassembly
(discussed next), we transform all initialization points to
use indices. Then, wherever the code pointer is copied