7:6
•
T. Kotzmann et al.
single point in the program where a value is assigned to it. An instruction that
loads or computes a value represents both the operation and its result, so that
operands can be represented as pointers to previous instructions. Both during
and after generation of the HIR, several optimizations are performed, such as
constant folding, value numbering, method inlining, and null check elimination.
They benefit from the simple structure of the HIR and the SSA form.
The back end of the compiler translates the optimized HIR into a low-level
intermediate representation (LIR). The LIR is conceptually similar to machine
code, but still mostly platform-independent. In contrast to HIR instructions,
LIR operations operate on virtual registers instead of references to previous
instructions. The LIR facilitates various low-level optimizations and is the input
for the linear scan register allocator, which maps virtual registers to physical
ones.
After register allocation, machine code can be generated in a rather simple
and straightforward way. The compiler traverses the LIR, operation by opera-
tion, and emits appropriate machine instructions into a code buffer. This process
also yields object maps and debugging information.
2.1 High-Level Intermediate Representation
The high-level intermediate representation (HIR) is a graph-based represen-
tation of the method using SSA form [Cytron et al. 1991]. It is platform-
independent and represents the method at a high level where global optimiza-
tions are easy to apply. We build the SSA form at parse time of the bytecodes,
similarly to Click and Paleczny [1995]. The modeling of instruction types as a
C++ class hierarchy and the representation of operands are other similarities
to this intermediate representation.
The control flow is modeled using an explicit CFG, whose nodes represent
basic blocks, i.e. longest possible sequences of instructions without jumps or
jump targets in the middle. Only the last instruction can be a jump to one or
more successor blocks or represent the end of the method. Because instructions
that can throw exceptions do not terminate a basic block, control can also be
transferred to an exception handler in the middle of a block (see Section 2.5).
The instruction types of the HIR are represented by a class hierarchy with a
subclass for each kind of instruction. The instruction nodes also form the data
flow: Instructions refer to their arguments via pointers to other instructions.
This way, an instruction represents both the computation of a result and the
result itself. Because of this equivalence, an instruction is often referred to as
a value. The argument need not be defined in the same block, but can also be
defined in a dominator, i.e. a common predecessor on all input paths. Instead of
explicit instructions for accessing local variables, instructions reference those
instructions that compute the most recent value of the variables. Figure 3 shows
an example for the control and data flow of a short loop.
The HIR is constructed in two passes over the bytecodes. The first pass de-
tects the boundaries of all basic blocks and performs a simple loop analysis
to mark loop header blocks. Backward-branch targets of irreducible loops are
are also treated as loop headers. The basic blocks are created, but not linked
ACM Transactions on Architecture and Code Optimization, Vol. 5, No. 1, Article 7, Publication date: May 2008.