13:4 H. Zhou et al.
accuracy than other parsers, which indicates that the nonlocal features are very helpful
to syntax parsing.
2.2. Using Action for Parsing Disambiguation
Briscoe and Carrol [1993] described work toward the construction of a probabilistic
parsing system for natural language, based on the LR parsing technique. They proposed
to associate probabilities with transitions in the automaton in a generalized LR parsing
framework [Tomita 1987]. They combined parsing action with the current parsing
state, lookahead item, and resultant nonterminal. The final probabilities of combined
actions are derived from the set of parse histories resulting from the training phase,
by counting the frequencies of combined actions and converting these to probabilities.
There are many generalized LR parsers that assign probabilities to actions [Lavie
1996; Kentaro et al. 1998]. Compared to associating the probabilities to the rules of
the grammar, the methods allow the probabilistic parser to distinguish situations in
which identical rules reapply in different ways across different derivations or apply
with differing probabilities in different contexts.
The action n-gram model and statistical generalized LR parsers both assign prob-
abilities to actions for parsing disambiguation. The statistical generalized LR parser
assigns probabilities to the next actions by counting their co-occurrence frequencies
with the current state, lookahead item, and resultant state. In contrast, the action n-
gram model assigns probabilities to the next actions by counting the previous n actions
directly. In generalized LR parsing, the action is combined with the lookahead item and
resultant nonterminal in the LR parsing table. But the action of action n-gram models
is based on head word, head pos-tag, and constituent label information in the parsing
stack. The statistical generalized LR parser focuses on exploiting the state transitions
in parsing. In contrast, the action n-gram model focuses on the action s equence and
syntax tree structures formed by the action sequence.
The action n-gram model is built upon a data-driven shift-reduce parsing framework
with an n-gram estimation method rather than on a generalized LR parsing framework.
Compared to using the action history only, we propose to incorporate the action n-gram
model into a discriminative parsing model to enhance the parsing performance.
3. SHIFT-REDUCE PARSING
Typical shift-reduce parsers parse a sentence by performing a sequence of shift-reduce
actions. The action to be performed is determined by a statistical classifier, and the
parsing result is obtained by searching greedily from left to right of the sentence
[Sagae and Lavie 2005]. Zhang and Clark [2009] applied global discriminative training
and beam-search to obtain higher accuracies. Zhu et al. [2013] added a new action to fill
the gap among action sequences with different lengths caused by unary reduce actions.
In this section, we briefly review the shift-reduce constituent parsing framework.
3.1. Actions of Shift-Reduce Parsing
A shift-reduce parser parses a sentence from left to right and generates the whole parse
tree by performing a sequence of actions. The parser starts with an initial state and
makes transitions from one to another by performing actions. Every state consists of
a queue Q{q
0
, q
1
, q
2
,...} and a stack S{...,s
2
, s
1
, s
0
}, where Q contains the word and
POS-tag pairs to be processed and S contains partially parsed subtrees. In each step,
one of the following actions is applied:
—SHIFT (S): push the first word and POS-tag pair of the queue onto the stack as the
top node s
0
.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 15, No. 3, Article 13, Publication date: February 2016.