they are very fast as they are completely table-driven. Fortunately, as we will see,
the same ideas can also be applied to drive instruction selection.
In 1978 Glanville and Graham [111] presented a seminal
+
c
∗
a
b
⇓
+ ∗ a b c
Figure 3.2: Linear-
ising a tree.
paper that descr ibes how the same, syntactic techniques for ad-
dressing syntax parsing can also be adapted to pattern matching
and selection. (This was also already hinted at, albeit vaguely,
by Feldman and Gries in 1968 [87, p. 107].) Glanville and Gra-
ham recognised that by linearising the trees using Polish prefix
notation (i.e.
1 + (2 + 3)
is expressed as
+ 1 + 2 3
, thus
making parentheses obsolete; also see Figure 3.2), the machine
instructions can be expressed as a set of grammatical production
rules. One such set is given in Figure 3.3). With a modified LR(1)
parser, this can be used to drive instruction selection. We will
refer to this technique as the Glanville-Graham approach.
In a rough sketch, the approach works as follows. The program string is
progressively parsed and converted into the tokens which are pushed onto a s tack.
As each token is pushed, the algorithm makes a decision whether to shift (continue)
or to reduce (pop tokens from the stack). Which decision to make is deduced
from a precomputed table that has been generated from the grammar. Figure 3.4
shows the table computed from the grammar in Figure 3.3. A reduction can be
performed if there exists some rule that matches the top-most portion of the stack.
This corresponds to the pattern matching task. During reduction, the matched
items on the stack will be popped and replaced with the left-hand symbol of the
selected rule. Hence this corresponds to the pattern selection task. This process of
shif ting and reducing continues until the entire program has been parsed, hopefully
ending up in an accept state via a start symbol. A walk-through of such an execution
is given in Figure 3.5.
There are several key differences between a regular syntax parser and Glanville
and Graham’s algorithm. First, the instruction set grammar for a target machine is
normally highly ambiguous. This incurs many shift-reduce and reduce-reduce con-
flicts which have to be resolved in some manner. Glanville and Graham addressed
shift-reduce conflicts by always opting for a shift in these situations. The effect is that
the instruction selector will attempt to select as large patterns as possible, which is
most often the desired outcome. The idea of always selecting the largest possible
pattern is commonly known as maximum munching, or just maximum munch,
which was coined by Cattell [42] in his PhD thesis. Furthermore, reduce-reduce
conflicts are resolved with a simple heuristic that chooses the rule with the longest
right-hand side production. In equal-length conflicts the heuristic selects the first
rule defined in the grammar.
Second, the reduce step is considerably more complicated as it needs to consider
both syntactic and semantic information. For instance, when a shift is performed
on an input symbol
r
, information about the register which
r
represents is also
pushed on the stack. This is necessary in order to emit the machine instructions. In
addition, Glanville and Graham chose to incorporate register allocation into this
step.
12