exception throws or catches. By abstracting at the level of
execution events, our tracing runtime supports multiple
language runtime systems including the JVM that we describe
in this paper.
Our trace selection algorithm first determines a point to
start a trace (a trace head) and then records the next execution
starting from the trace head as a trace, similar to the well-
known NET (Next Executing Tail) strategy [1]. To focus on the
basic trace-JIT characteristics, we currently collect only linear
traces or cyclic traces (where a cyclic trace has a jump to its
own trace head at the end). Hence we do not have any join
points in our traces except for the trace head of a cyclic trace.
Figure 2 shows examples of a linear trace and a cyclic trace.
To identify a hot trace head, we assign a counter called a
hotness counter for each potential trace head, which includes
the target of a taken backward branch and any bytecode that
immediately follows the exit point of an already formed trace
to achieve sufficiently high coverage for the JIT compiled code.
We manage the information associated with the bytecode
addresses, such as the hotness counter or the compiled code
address, using a globally synchronized hash map called a trace
cache. The trace selection engine increments the hotness
counter when it receives an event for the bytecode address and
it starts recording the execution to form a trace when the
hotness counter reaches a predefined threshold. We used 500 as
the threshold in this paper. For example, a loop head is selected
as a trace head after 500 iterations. We picked the threshold
based on the thresholds used in the baseline method-JIT to start
initial compilation.
In the recording mode, the trace selection engine records all
basic blocks (BBs) executed until one of the trace termination
conditions is satisfied. We terminate a trace when (1) it forms a
cycle in the recording buffer, (2) it executes a backward branch
(even it does not form a cycle), (3) it calls a native (JNI)
method that we cannot include in a trace, (4) it throws an
exception, or (5) the recording buffer becomes full. The default
size of the recording buffer is 128 BBs for one recording. As
we will describe in Section IV, we allow traces to include calls
to a selected set of JNI methods from the Java standard library
to maximize the performance. Calls to other JNI methods will
terminate the recording of a trace. If a trace forms a cycle by
jumping to its trace head, the trace becomes a cyclic trace. We
identify a cyclic execution patterns accurately by checking the
calling context of each BB in the trace [16]. Otherwise, the
trace becomes a linear trace. A trace collected by the selection
engine is sent to a shared waiting queue that is processed by the
compilation thread.
When the compilation thread compiles the trace, it puts the
entry point address of the compiled code in the trace cache.
Once the compiled code address becomes available in the trace
cache, the interpreter transfers control to the entry point of the
compiled code when the execution reaches the head of a trace.
At the exit of the compiled trace, it returns control to the
interpreter or directly dispatches the next compiled trace using
a technique called trace linking [1]. Currently we do not
employ a specialization technique and thus there is at most one
trace starting from the same bytecode address.
B. Trace-based JIT Compiler and Scope Mismatch
We implemented our trace-JIT by enhancing a mature
method-JIT instead of implementing it from scratch. Our trace-
JIT takes a Java bytecode sequence and the originating location
(Java method and bytecode index in the method) for each
bytecode in the trace as input.
In trace-based compilation, a compilation scope probably
does not match the method scope. Thus we need to assume that
local variables and operands in the operand stack may live at
the beginning and the end of the compilation scope, while all
these values must be dead in the method-based compilation.
We call this problem scope mismatch. Scope mismatch is a
large obstacle when implementing a trace-based compiler from
a method-based compiler. For example, the first bytecode in a
trace may require operands on the operand stack, but a
compiler cannot identify the type of the operands because the
value comes from outside the current compilation scope. To
handle problems caused by scope mismatch, we implemented a
helper function that analyzes the bytecode sequence of a
method, regardless of the current compilation scope. The helper
function identifies the type and liveness of operands on stack
and local variables at the specified program location. The
liveness information at compilation scope boundaries obtained
from this helper function is critical for both code generation
and optimization. For example, the IR (Intermediate
Tracing runtime
interpreter
interpreter
trace (Java bytecode)
trace selection
engine
trace selection
engine
IR generator
IR generator
optimizers
optimizers
code generator
code generator
trace dispatcher
trace dispatcher
garbage collector
code cache
code cache
class libraries
trace cache
(hash map)
trace cache
(hash map)
(e.g. hotness counter and
compiled code address)
Java VM
JIT compiler
modified component
modified component
unmodified component
new component
new component
execution events
compiled code
early redundancy
elimination
early redundancy
elimination
Figure 1. Overview of our trace-JIT system architecture.
A
A
B
B
C
C
stub
stub
entry
exit
exit
stub
stub
exit
the program
may exit from
the trace at
conditional
branch,
virtual call
guard, or
return guard
stub blocks
to restore
JVM state
A
A
B
B
entry
stub
stub
exit
stub
stub
exit
stub block
for trace exit
caused by an
exception
the program
branches out
from the trace
if an exception
occurs
(a) linear trace
(b) cyclic trace
(consists of 3 BBs) (consists of 2 BBs)
Figure 2. Example of (a) a linear trace and
(b) a cyclic trace. Each box shows a basic block.