heuristics for non-loop branches and measures their effec-
tiveness in isolation. Section 5 considers combining these
simple heuristics into a complete heuristic and contains the
results for this heuristic. Section 6 presents results on how
our heuristic performs at finding sequences of instructions
without a mispredicted branch. We compare profile-based
methods for measuring this quantity with trace-based
methods and show why trace-based methods are preferable.
Section 7 examines the performance of our heuristic on dif-
ferent datasets. Section 8 reviews related work and Section
9 concludes the paper.
2. BACKGROUND
We restrict our heuristics to predicting two-way conditional
branches with fixed targets. Throughout the paper, the
word branch refers to such branches. We do not consider
branches whose target is dynamically determined (by
lookup in a jump table, for example). Associated with each
conditional branch instruction is its target successor—the
instruction to which control passes if the branch condition
evaluates to true—and its fall-thru successor—the instruc-
tion to which control passes if the branch condition evalu-
ates to false.
We used our profiling and tracing tool QPT [2] both as a
platform for studying branch behavior and for making
branch predictions. QPT takes as input a MIPS executable
file and produces an instrumented program that generates an
edge profile (i.e., for each branch, a count of how many
times control passes to the target and fall-thru successor)
when run. QPT can also instrument a program to produce
an instruction and address trace. Since QPT operates on an
executable file, all program procedures are analyzed. The
numbers in this paper include DEC Ultrix 4.2 library pro-
cedures as well as application procedures.
In order to instrument an executable file, QPT builds a
control flow graph for each procedure in the executable file.
Each vertex in the control flow graph represents a basic
block of instructions. A basic block ending with a condi-
tional branch corresponds to a vertex in the control flow
graph with two outgoing edges. The root vertex of the con-
trol flow graph is the entry point of the procedure. A basic
block containing a return (procedure exit) has no successors
in the control flow graph.
Some of our heuristics make use of the control flow
graph’s domination and postdomination relations [1]. A
vertex v dominates w if every path from the entry point of
the procedure to w includes v. A vertex w postdominates v
if every path from v to any exit vertex includes w. If the
successor of a branch postdominates the branch, then no
matter which direction the branch takes, the successor even-
tually executes.
We analyzed the programs in the SPEC89 benchmark
suite [4], along with a number of other programs. These
benchmarks (23 of them) are listed in Table 1, along with a
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
Size
Program Description Lng. (1Kb)
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
congress Interp. for Prolog-like lang. C++ 856
ghostview X postscript previewer C 831
gcc * GNU C compiler C 688
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
lcc Fraser & Hanson’s C cmplr. C 254
rn Net news reader C 221
espresso * PLA minimization C 188
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
qpt Profiling and tracing tool C 143
awk Pattern scanner & processer C 102
xlisp * Lisp interpreter C 78
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
eqntott * Boolean eqns. to truth table C 45
addalg Integer program solver C 33
compress File compression utility C 25
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
grep Search file for regular expr. C 20
poly Polydominoes game C 16
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
spice2g6 * Circuit simulation F 385
doduc * Hydrocode simulation F 184
fpppp * Two-electron integral deriv. F 168
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
dnasa7 * Floating point kernels F 90
tomcatv * Vectorized mesh generation F 66
matrix300 * Matrix multiply F 61
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
costScale Solve minimum cost flow C 41
dcg Conjugate gradient C 41
sgefat Gaussian elimination C 33
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiic
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
Table 1. Benchmarks, sorted by code size. SPEC benchmarks are
marked with *. Fortran benchmarks are marked with an F.
Benchmarks were optimized (-O or -O2).
description of their function. We have broken the bench-
marks into two major groups: programs that perform little
to no floating point computation and programs that perform
many floating point operations. Within each group, pro-
grams are sorted by the size of their object code. All of the
benchmarks were compiled and analyzed on a DECstation
(a MIPS R2000/R3000 processor) with -O optimization.
The results presented in Sections 3 to 6 are for a single
execution of each benchmark. Previous work has shown
that most branches behave similarly over different execu-
tions [7]. That is, if a branch takes one direction with high
probability during one execution of a program, it most
likely takes the same direction with high probability in
other executions. The goal of this work is to show that
static prediction can accurately determine these branch
directions, rather than confirm previous results. However,
we also tested our predictor on multiple datasets per bench-
mark and found similar results to those of [7]. Section 7
summarizes the results of these experiments.
We are concerned with static branch prediction. That is,
for each branch either the target or fall-thru successor is
predicted, and this prediction does not change during the
execution of the program. Predicting a branch corresponds
to choosing one of the two outgoing edges from the vertex
containing the branch in the control flow graph. For an exe-
cution, the standard for how well static branch prediction
can potentially perform is the perfect static predictor, which
—2—