S
1
· · ·
· · ·
S
0
· · ·
S
AR(l)
AL(l)
PR
Q
0
Q
1
· · ·
Q
SH
(a) Arc-standard dependency parsing model for a single dependency tree
S
a
1
· · ·
· · ·
S
a
0
· · ·
S
a
AR
a
(l)
AL
a
(l)
PR
a
Q
a
0
Q
a
1
· · ·
Q
a
SH
a
S
b
1
· · ·
· · ·
S
b
0
· · ·
S
b
AR
b
(l)
AL
b
(l)
PR
b
Q
b
0
Q
b
1
· · ·
Q
b
SH
b
Guided
a
Guided
b
(b) The joint model based on arc-standard dependency parsing for two dependency trees
Figure 2: Illustrations for the baseline dependency parsing model and our proposed joint model.
In the baseline arc-standard transition system, we define four kinds of actions, as shown in Figure 2(a).
They are shift (SH), arc-left with dependency label l (AL(l)), arc-right with dependency label l (AR(l))
and pop-root (PR), respectively. The shift action shifts the first element Q
0
of the queue onto the stack;
the action arc-left with dependency label l builds a left arc between the top element S
0
and the second
top element S
1
on the stack, with the dependency label being specified by l; the action arc-right with
dependency label l builds a right arc between the top element S
0
and the second top element S
1
on the
stack, with the dependency label being specified by l; and the pop-root action defines the root node of a
dependency tree when there is only one element on the stack and no element in the queue.
During decoding, each state may have several actions. We employ a fixed beam to reduce the search
space. The low-score states are pruned from the beam when it is full. The feature templates in our
baseline are shown by Table 1, referring to baseline feature templates. We learn the feature weights by
the averaged percepron algorithm with early-update (Collins and Roark, 2004; Zhang and Clark, 2011).
3 The Proposed Joint Model
The aforementioned baseline model can only handle a single dependency tree. In order to parse multiple
dependency trees for a sentence, we usually use individual dependency parsers. This method is not
able to exploit the correlations across different dependency schemes. The joint model to parse multiple
dependency trees with a single model is an elegant way to exploit these correlations fully. Inspired by
this, we make a novel extension to the baseline arc-standard transition system, arriving at a joint model
to parse two heterogeneous dependency trees for a sentence simultaneously.
In the new transition system, we double the original transition state of one stack and one queue into
two stacks and two queues, as shown by Figure 2(b). We use stacks S
a
and S
b
and queues Q
a
and Q
b
to save partial-parsed dependency trees and unprocessed words for two schemes a and b, respectively.
Similarly, the transition actions are doubled as well. We have eight transition actions, where four of them
are aimed for scheme a, and the other four are aimed for scheme b. The concrete action definitions are
similar to the original actions, except an additional constraint that actions should be operated over the
corresponding stack and queue of scheme a or b.
We assume that the actions to build a specific tree of scheme a are A
a
1
A
a
2
· · · A
a
n
, and the actions to