Allocation Removal by Partial Evaluation in a Tracing JIT
Carl Friedrich Bolz
a
Antonio Cuni
a
Maciej Fijałkowski
b
Michael Leuschel
a
Samuele Pedroni
c
Armin Rigo
a
a
Heinrich-Heine-Universität Düsseldorf, STUPS Group, Germany
b
merlinux GmbH, Hildesheim, Germany
c
Open End, Göteborg, Sweden
cfbolz@gmx.de anto.cuni@gmail.com jal@merlinux.eu leuschel@cs.uni-duesseldorf.de
samuele.pedroni@gmail.com arigo@tunes.org
Abstract
The performance of many dynamic language implementations suf-
fers from high allocation rates and runtime type checks. This makes
dynamic languages less applicable to purely algorithmic problems,
despite their growing popularity. In this paper we present a simple
compiler optimization based on online partial evaluation to remove
object allocations and runtime type checks in the context of a trac-
ing JIT. We evaluate the optimization using a Python VM and find
that it gives good results for all our (real-life) benchmarks.
1
Categories and Subject Descriptors D.3.4 [Programming Lan-
guages]: Processors—code generation, interpreters, run-time envi-
ronments
General Terms Languages, Performance, Experimentation
Keywords Tracing JIT, Partial Evaluation, Optimization
1. Introduction
The objective of a just-in-time (JIT) compiler for a dynamic lan-
guage is to improve the speed of the language over an implementa-
tion of the language that uses interpretation. The first goal of a JIT
is therefore to remove the interpretation overhead, i.e. the overhead
of bytecode (or AST) dispatch and the overhead of the interpreter’s
data structures, such as operand stack etc. The second important
problem that any JIT for a dynamic language needs to solve is how
to deal with the overhead of boxing primitive types and of type dis-
patching. Those are problems that are usually not present or at least
less severe in statically typed languages.
Boxing of primitive types is necessary because dynamic lan-
guages need to be able to handle all objects, even integers, floats,
booleans etc. in the same way as user-defined instances. Thus those
primitive types are usually boxed, i.e., a small heap-structure is al-
located for them that contains the actual value. Boxing primitive
types can be very costly, because a lot of common operations, par-
ticularly all arithmetic operations, have to produce new boxes, in
1
This research is partially supported by the BMBF funded project PyJIT
(nr. 01QE0913B; Eureka Eurostars).
[Copyright notice will appear here once ’preprint’ option is removed.]
addition to the actual computation they do. Because the boxes are
allocated on the heap, producing many of them puts pressure on the
garbage collector.
Type dispatching is the process of finding the concrete imple-
mentation that is applicable to the objects at hand when performing
a generic operation on them. An example would be the addition of
two objects: For addition the types of the concrete objects need to
be checked and the suiting implementation chosen. Type dispatch-
ing is a very common operation in modern
2
dynamic languages be-
cause no types are known at compile time. Therefore all operations
need it.
A recently popular approach to implementing just-in-time com-
pilers for dynamic languages is that of a tracing JIT. A tracing JIT
works by observing the running program and recording its hot spots
into linear execution traces. Those traces are optimized and turned
into machine code.
One reason for the popularity of tracing JITs is their relative
simplicity. They can often be added to an existing interpreter,
reusing a lot of the interpreter’s infrastructure. They give some
important optimizations like inlining and constant-folding for free.
A tracing JIT always produces linear pieces of code, which sim-
plifies many of the hard algorithms in a compiler, such as register
allocation.
The use of a tracing JIT can remove the overhead of bytecode
dispatch and that of the interpreter data structures. In this paper
we want to present a new optimization that can be added to a
tracing JIT that further removes some of the overhead more closely
associated to dynamic languages, such as boxing overhead and
type dispatching. Our experimental platform is the PyPy project,
which is an environment for implementing dynamic programming
languages. PyPy and tracing JITs are described in more detail
in Section 2. Section 3 analyzes the problem to be solved more
closely.
The core of our trace optimization technique can be viewed as
partial evaluation: the partial evaluation performs a form of escape
analysis [4] on the traces and makes some objects that are allocated
in the trace static, which means that they do not occur any more
in the optimized trace. This technique is informally described in
Section 4; a more formal description is given in Section 5. The
introduced techniques are evaluated in Section 6 using PyPy’s
Python interpreter.
The contributions made by this paper are:
2
For languages in the LISP family, basic arithmetic operations are typically
not overloaded; even in Smalltalk, type dispatching is much simpler than in
Python or JavaScript.
1 2010/10/22