that state U has outgoing transitions labeled by the charac-
ters c and d, and that its parent is R, which is the root of a
default transition tree. The content label for transitions en-
tering state V is ab,cd,R. This tells us that state V has outgo-
ing transitions labeled by the characters a and b, and that its
parent (in the default transition tree) has outgoing transi-
tions labeled by the characters c and d, and that its parent’s
parent is R, which is the root of a default transition tree.
Suppose that the current state of the D
2
FA is one of the
predecessors of state V and that the current input character
selects a content label for a transition to state V and that the
next input character is x. While V is the next state, since V
has no labeled transition for x, we would like to avoid visit-
ing state V so that we can skip the associated memory ac-
cess. Similarly, we would like to avoid visiting state U,
since it also has no labeled transition for x. Assume that we
have a hash function h for which h(cd,R)=U and for which
h(ab,U)=V. Given the content label ab,cd,R (which is
stored at the predecessor state), we can determine that nei-
ther our immediate next state (V) nor its parent (U) has an
outgoing transition for x. Hence, we can proceed directly to
R. If on the other hand, the next input character is c or d,
then we can proceed directly to U by computing h(cd,R).
Similarly, if the next input character is a or b, we can pro-
ceed directly to V by computing h(ab,h(cd,R)).
Summarizing, we associate a content label with every
state in a D
2
FA. Each label includes a character set for the
state and each of its ancestors in the default transition tree,
plus a number identifying the state at the root of the tree.
We augment the content label with a bit string that indicates
which of the states on the path from the given state to the
root of its tree are matching states for the automaton. In our
examples, we use underlining of the character set for a
given state to denote that the state is a matching state. So, if
state U in our example matched an input pattern of interest,
we would write the content label for U as cd
,R and the con-
tent label for V as ab,cd
,R. Content labels are stored at
predecessor states, and hashing is used to map the labels to
the next state that we need to visit.
3.2 Complete Example
We now turn to a more complete example. Figure 2a shows
a DFA that matches the patterns a[aA]
+
b
+
, [aA]
+
c
+
, b
+
[aA]
+
,
b
+
[cC] and dd
+
. Part b of the figure shows a corresponding
space reduction graph and part c shows a D
2
FA constructed
using this space reduction graph. The default transitions are
shown as bold edges. Note that states 1 and 8 are roots of
their default transition trees and that the longest sequence of
default transitions that can be followed without consuming
an input character is 2. If we use the D
2
FA to parse an input
string, the number of memory accesses can be as large as
three times the number of characters in the input string.
Consider a parse of the string aAcba. Using the original
DFA, we can write this in the form
956441 →→→→→
abcAa
Here, the underlined state numbers indicate matching states.
Using the D
2
FA, we the parse of the string will be
98156661441 →→→→→
abcAa
Here, we are showing the intermediate states traversed by
the D
2
FA. To specify the CD
2
FA, we first need to write the
content labels for each of the states. These are listed below.
Note that since states 3 and 7 have no labeled outgoing
transitions in the D
2
FA, their content labels include empty
character sets that are indicated by dashes. The dash in the
content label for state 3 is underlined to indicate that state 3
is a matching state.
6. c,1
7. –,1
8. 8
9. cC
,8
1. 1
2. d,1
3. –
, d,1
4. b,c
,1
1
4
5
8
9
6
a,A
b
c
b
a,A
b
c
a,A
b
c
b
a,A
a,A
2
b
d
7
c,C
a,A
b
a,A
b
d
1
4
5
8
9
6
2
7
6
5
6
5
7
6
7
7
7
5
8
7
5
c,C
6
a,A
5
7
d
From
states 4-9
3
d
a,A
b
3
8
5
7
6
7
1
4
5
8
9
6
a,A
b
b
c
a,A
b
c
2
b
d
7
c,C
d
From
state 8
3
d
C
to state 1
Figure 2. a) DFA recognizing patterns [aA]
+
b
+
, [aA]
+
c
+
, b
+
[aA]
+
, b
+
[cC], and dd
+
over alphabet {a, b, c, d, A, B, C, D} (transitions
for characters not shown in the figure leads to state 1). b) Corresponding space reduction graph (only edges of weight greater
than 4 are shown). c) A set of default transition trees (tree diameter bounded to 4 edges) and the resulting D
2
FA.