Abstract
In this paper, we introduce our work on
SemEval-2012 task 5: Chinese Semantic De-
pendency Parsing. Our system is based on
MSTParser and two effective methods are
proposed: splitting sentence by punctuations
and extracting last character of word as lemma.
The experiments show that, with a combina-
tion of the two proposed methods, our system
can improve LAS about one percent and final-
ly get the second prize out of nine participat-
ing systems. We also try to handle the multi-
level labels, but with no improvement.
1 Introduction
Task 5 of SemEval-2012 tries to find approaches to
improve Chinese sematic dependency parsing
(SDP). SDP is a kind of dependency parsing. Cur-
rently, there are many dependency parsers availa-
ble, such as Eisner’s probabilistic dependency
parser (Eisner, 1996), McDonald’s MSTParser
(McDonald et al. 2005a; McDonald et al. 2005b)
and Nivre’s MaltParser (Nivre, 2006).
Despite of elaborate models, lots of problems
still exist in dependency parsing. For example, sen-
tence length has been proved to show great impact
on the parsing performance. (Li et al., 2010) used a
two-stage approach based on sentence fragment for
high-order graph-based dependency parsing. Lack-
ing of linguistic knowledge is also blamed.
Three methods are promoted in this paper try-
ing to improve the performance: splitting sentence
by commas and semicolons, extracting last charac-
ter of word as lemma and handling multi-level la-
bels. Improvements could be achieved through the
first two methods while not for the third.
2 Overview of Our System
Our system is based on MSTParser which is one of
the state-of-the-art parsers. MSTParser tries to ob-
tain the maximum spanning tree of a sentence. For
projective parsing task, it takes Eisner’s algorithm
(Eisner, 1996) to get the dependency tree in O(n
3
)
time. Meanwhile, Chu-Liu-Edmond’s algorithm
(Chu and Liu, 1965) is applied for non-projective
task, which takes O(n
2
) time.
Three methods are adopted to MSTParser in our
system:
1) Sentences are split into sub-sentences by
commas and semicolons, for which there
are two ways. Splitting sentences by all
commas and semicolons is used in our
primary system. In our contrast system, we
use a classifier to determine whether a
comma or semicolon can be used to split
the sentence. In the primary and contrast
system, the proto sentences and the sub-
sentences are trained and tested separately
and the outputs are merged in the end.
2) In a Chinese word, the last character usual-
ly contains main sense or semantic class.
We treat the last character of the word as
word lemma and find it gets a slightly im-
provement in the experiment.
3) An experiment trying to solve the problem
of multi-level labels was conducted by
parsing different levels separately and con-
sequently merging the outputs together.
The experiment results have shown that the first
two methods could enhance the system perfor-
mance while further improvements could be ob-
tained through a combination of them in our sub-
submitted systems.