1:4 H. Zhou et al.
head item and resultant nonterminal. The final probabilities of combined actions are
derived from the set of parse histories resulting from the training phase, by counting
the frequencies of combined actions.
There are many generalized LR parsers that assign probabilities to actions [Lavie
et al. 1996; Kentaro et al. 1998]. Compared to associate the probabilities to rules of
the grammar, the LR parser is able to disambiguate in more general situations, in
which identical rules reapply in different ways across different derivations or apply
with differing probabilities in different contexts.
The action n-gram model and statistical generalized LR parsers both assign prob-
abilities to actions in different ways. The statistical generalized LR parser compute
action probability by counting their co-occurrence frequencies with the current state,
lookahead item and resultant state. In contrast, the action n-gram model obtains ac-
tion probabilities by counting the previous n actions directly. In generalized LR pars-
ing, action is combined with lookahead item and resultant nonterminal in the LR pars-
ing table. But the action of action n-gram models is based on head word, head pos-tag
and constituent label information in the parsing stack. The statistical generalized LR
parser focuses on exploiting the state transitions in parsing. The action n-gram model
focuses on the action sequence and syntax tree structures formed by the action se-
quence.
The action n-gram model is built upon a data-driven shift-reduce parsing framework
with a n-gram estimation method rather than on a generalized LR parsing framework.
Compared to using the action history only, we propose to incorporate the action n-gram
model into a discriminative parsing model to enhance the parsing performance.
3. SHIFT-REDUCE PARSING
Typical shift-reduce parsers parse a sentence by performing a sequence of shift-reduce
actions. The action to be performed is determined by a statistical classifier, and the
parsing result is obtained by searching greedily from left to right of the sentence
[Sagae and Lavie 2005]. Zhang and Clark [2009] applied global discriminative train-
ing and beam-search to obtain higher accuracies. Zhu et al. [2013] added a new action
to fill the gap among action sequences with different lengths caused by unary reduce
actions. In this section, we briefly review the shift-reduce constituent parsing frame-
work.
3.1. Actions of Shift-Reduce Parsing
A shift-reduce parser parses a sentence from left to right, and generates the whole
parse tree by performing a sequence of actions. The parser starts with an initial state
and makes transitions from one to another by performing actions. Every state consists
of a queue Q{q
0
, q
1
, q
2
, . . . } and a stack S{. . . , s
2
, s
1
, s
0
}, where Q contains the word
and POS-tag pairs to be processed and S contains partially parsed subtrees. At each
step, one of the following actions is applied:
— SHIFT (S): push the first word and POS-tag pair of the queue onto the stack as the
top node s
0
.
— UNARY-REDUCE-X (UR·X): pop the top node s
0
off the stack; generate a unary-
branching node with constituent label X whose child is s
0
; push the new node back
onto the stack.
— BINARY-REDUCE-L/R-X (BR-L/R·X): pop the top two nodes s
1
, s
0
off the stack;
generate a binary-branching node with constituent label X whose left child is s
1
and right child is s
0
, with the left(L)/right(R) child as its head; push the new node
back to the stack.
ACM Transactions on Asian Language Information Processing, Vol. 9, No. 4, Article 1, Publication date: September YYYY.