Scalable Support for Multithreaded Applications on
Dynamic Binary Instrumentation Systems
Kim Hazelwood
†,‡
Greg Lueck
‡
Robert Cohn
‡
†
University of Virginia
‡
Intel Corporation
www.pintool.org
Abstract
Dynamic binary instrumentation systems are used to inject or mod-
ify arbitrary instructions in existing binary applications; several
such systems have been developed over the past decade. Much of
the literature describing the internal architecture and performance
of these systems has focused on executing single-threaded guest
applications. In this paper, we discuss the specific design deci-
sions necessary for supporting large, multithreaded applications on
JIT-based dynamic instrumentation systems. While implementing
a working solution for multithreading is straightforward, provid-
ing a system that scales in terms of memory and performance is
much more intricate. We highlight the design decisions in the lat-
est version of the Pin dynamic instrumentation system, including
the just-in-time compiler, the emulator, and the code cache. The
overall design strives to provide scalable performance and memory
footprints on modern applications.
Categories and Subject Descriptors D.3.4 [Programming Lan-
guages]: Code generation, Optimization, Run-time environments
General Terms Languages, Management, Measurement, Perfor-
mance
Keywords scalability, multithreading, memory management, in-
strumentation
1. Introduction
The recent trend toward multicore architectures has led software
developers to focus on ways to leverage multiple processing cores
in their application software. One way to utilize multiple cores is
to develop multithreaded (MT) applications. Despite the fact that
MT programs are ubiquitous, many system designers still evaluate
their systems with small, single-threaded (ST) applications. There
are many factors contributing to the lack of analysis of systems
with MT workloads. Simulation and analysis tools either empha-
size or exclusively support single-threaded applications or are too
slow to execute large MT programs. MT applications are inherently
less deterministic than ST applications, complicating the evaluation
methodology (Pereira et al. 2008). The disconnect between today’s
architectures and the applications supported by today’s tools is par-
ticularly problematic as we move further away from single-core
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
ISMM’09
June 19-20, 2009, Dublin, Ireland.
Copyright
c
2009 ACM 978-1-60558-347-1/09/06. . . $5.00
machines, and the results acquired from single-threaded applica-
tions become even more irrelevant.
To remedy this disconnect, many simulation and analysis tools
are adding support for multithreaded applications. Developers of
these tools are learning that providing support for multithreaded
applications is often straightforward, but providing robust and/or
scalable support for multithreading tends to be much more of a
challenge (Jaleel et al. 2008). Given recent trends toward dramati-
cally increasing the number of cores in multicore processors, it is
ever more critical for the development of scalable solutions to this
and many other design challenges.
At the same time, dynamic binary instrumentation has emerged
as an invaluable mechanism for analyzing and modifying software,
and even simulating new and existing hardware. Unlike static in-
strumentation systems, dynamic instrumentation systems enable
analysis of all executed instructions including shared libraries, dy-
namically generated code, and perhaps most importantly, applica-
tions for which source code is not available. One such dynamic
binary instrumentation system that is widely used due to its user-
friendly API and robust implementation is the Pin dynamic instru-
mentation system.
In this paper, we present and analyze various aspects of our de-
sign for robust support for multithreaded guest application execu-
tion on the Pin dynamic instrumentation system (Luk et al. 2005).
After providing an overview of Pin in Section 2, we introduce the
basic modifications necessary for supporting multithreaded appli-
cations in Section 3. Next, we delve into our approaches for sup-
porting signals in Section 4. Section 5 focuses on the code cache
and presents our trace construction policy that balances memory
and performance overheads, and our generational cache flushing
policy that allows us to avoid synchronizing cache flushes across all
threads. Section 6 then evaluates the resulting memory and perfor-
mance scalability of the system. Finally, Section 7 presents related
work and Section 8 concludes.
2. Pin’s High-Level Architecture
Before delving into the design aspects that target Pin’s support for
multithreading, we first provide a high-level view of Pin’s internal
architecture. We highlight the features that are discussed in more
detail when we focus multithreading support.
At a very high level, Pin is a tool that allows users to mod-
ify existing binary applications with an easy-to-use, cross-platform
instrumentation API. The user simply writes a short, plug-in C++
program (called a Pintool) that defines where the new code should
be inserted, what code to insert, and when to notify the user of
various events such as thread creation (i.e. callbacks). The rest is
handled automatically and transparently by Pin, which operates on
IA32, Intel
R
64, Intel
R
Itanium, and ARM. Pin operates at run-
time, since it is impossible to find and modify all of the instruc-