20
Alternating branches
While the C loop in the above example is predictable, and the B loop can be made
predictable by inserting a dummy branch, we still have a big problem with the JNZ X1
branch. This branch is alternately taken and not taken, and it is not correlated with any of
the preceding 16 branch events. Let's study the behavior of the predictors in this case. If the
local predictor starts in state "weakly not taken", then it will alternate between "weakly not
taken" and "strongly not taken" (see figure 3.1). If the entry in the global pattern history table
starts in an agree state, then the branch will be predicted to fall through every time, and we
will have 50% mispredictions (see figure 3.3). If the global predictor happens to start in state
"strongly disagree", then it will be predicted to be taken every time, and we still have 50%
mispredictions. The worst case is if the global predictor starts in state "weakly disagree". It
will then alternate between "weakly agree" and "weakly disagree", and we will have 100%
mispredictions. There is no way to control the starting state of the global predictor, but we
can control the starting state of the local predictor. The local predictor starts in state "weakly
not taken" or "weakly taken", according to the rules of static prediction, explained on page
26 below. If we swap the two branches and replace JNZ with JZ, so that the branch is taken
the first time, then the local predictor will alternate between state "weakly not taken" and
"weakly taken". The global predictor will soon go to state "strongly disagree", and the branch
will be predicted correctly all the time. A backward branch that alternates would have to be
organized so that it is not taken the first time, to obtain the same effect. Instead of swapping
the two branches, we may insert a 3EH prediction hint prefix immediately before the JNZ
X1 to change the static prediction to "taken" (see p. 26). This will have the same effect.
While this method of controlling the initial state of the local predictor solves the problem in
most cases, it is not completely reliable. It may not work if the first time the branch is seen is
after a mispredicted preceding branch. Furthermore, the sequence may be broken by a task
switch or other event that pushes the branch out of the BTB. We have no way of predicting
whether the branch will be taken or not taken the first time it is seen after such an event.
Fortunately, it appears that the designers have been aware of this problem and
implemented a way to solve it. While researching these mechanisms, I discovered an
undocumented prefix, 64H, which does the trick on the P4. This prefix doesn't change the
static prediction, but it controls the state of the local predictor after the first event so that it
will toggle between state "weakly not taken" and "weakly taken", regardless of whether the
branch is taken or not taken the first time. This trick can be summarized in the following rule:
A branch which is taken exactly every second time, and which doesn't correlate with any of
the preceding 16 branch events, can be predicted well on the P4 if it is preceded by a 64H
prefix. This prefix is coded in the following way:
; Example 3.5. P4 alternating branch hint
DB 64H ; Hint prefix for alternating branch
jnz X1 ; Branch instruction
No prefix is needed if the branch can see a previous instance of itself in the 16-bit
prehistory.
The 64H prefix has no effect and causes no harm on any previous microprocessor. It is an
FS segment prefix. The 64H prefix cannot be used together with the 2EH and 3EH static
prediction prefixes.
Pattern recognition for conditional jumps in P4E
Branch prediction in the P4E is simpler than in the P4. There is no agree predictor, but only
a 16-bit global history and a global pattern history table. This means that a loop can be
predicted well on the P4E if the repeat count multiplied by the number of conditional jumps
inside the loop does not exceed 17.