An Easier and Efficient Framework to Annotate Semantic Roles:
Evidence from the Chinese AMR Corpus
Li Song
1
, Yuan Wen
1
, Sijia Ge
1
, Bin Li
1
, Junsheng Zhou
2
, Weiguang Qu
2, 3
, Nianwen Xue
4
1. School of Chinese Language and Literature, Nanjing Normal University, Nanjing, 210024, China
2. School of Computer Science and Technology, Nanjing Normal University, Nanjing, 210023, China
3. Key Laboratory of Information Processing and Intelligent Control, Minjiang University, Fuzhou, 350108, China
4. Computer Science Department, Brandeis University, Waltham, 02453, USA
songli.njnu@gmail.com
Abstract
Semantic role labeling (SRL) is one of fundamental tasks
in Chinese language processing. At present, it has three
major problems on the construction of the SRL corpus.
First, there are disagreements over the definition of the
number and frame of semantic roles. Second, static
predicate frames are hard to cover dynamic predicate
usages. Third, it is unable to annotate the dropped
semantic roles. The newly designed Abstract Meaning
Representation (AMR) is a novel method of representing
the meaning of sentences, which offers dynamic
mechanisms to provide better solutions to the above three
problems. We use the Chinese AMR corpus of 5,000
sentences to make a detailed comparison between AMR
and other SRL resources. Data analysis shows that in
AMR, it is easier to annotate the semantic roles of a
predicate with the simplified distinction between core
roles and non-core roles. And 1,045 tokens of dropped
roles are annotated under this new framework. It
indicates that AMR offers a better solution for Chinese
SRL and sentence meaning processing.
Keywords: Abstract Meaning Representation, predicate
framework, semantic role, language knowledgebase
1 Introduction
Automatic semantic analysis is one of the core tasks in
Natural Language Processing (NLP). Therefore, building
the semantic resources is the first step for machine learning
based NLP systems. In semantic representation, semantic
relations between predicates and their semantic roles form
the backbone of the sentence structure. Thus, building the
predicate frames which describe such information becomes
an important issue in linguistics and NLP. There have been
many semantic role labeling (SRL) systems and SRL
resources in different languages, but there are several
problems in these SRL corpus.
First, the number of the semantic role labels of predicates
is still to be discussed in linguistics. VerbNet uses 30
general thematic role labels to represent semantic relations
(Kipper et al., 2000). Sinica Treebank distinguishes
necessary and unnecessary arguments and uses 60 semantic
role labels, 12 of which can represent necessary arguments
(Chen et al. 2003). FrameNet defines semantic roles on a
per-frame basis (Baker et al., 1998), so it avoids
determining how many semantic roles are needed for a
language, and there are 1224 frames in FrameNet and 323
frames in Chinese FrameNet (CFN). PropBank (Palmer et
al., 2005) and Chinese Proposition Bank (CPB) (Xue &
https://catalog.ldc.upenn.edu/LDC2017T10
Palmer, 2009) both define 5 predicate-specific semantic
roles for the core arguments and 13 semantic roles that are
consistent across predicates for non-core arguments. It can
be seen that the number of role labels used by different SRL
resources is quite different. This is mainly because these
resources are based on different theoretical backgrounds.
Second, it is hard for static predicate frames to cover
dynamic predicate usages. Predicate frames which do not
distinguish core and non-core roles are difficult to represent
whether a semantic role is necessary for the predicate. And
resources that define core roles in a predicate-independent
manner just as non-core roles neither could solve the
collision between core and non-core roles nor could
represent multi-functional semantic roles.
Third, limited to the annotating mechanism, most SRL
systems are unable to annotate the dropped semantic roles
of the predicates. For example, it is hard for most SRL
systems to represent correctly the meaning of the nominal
phrase the injured whose central words are dropped and one
of which… which drops the noun that appeared in the
preceding clause.
Abstract Meaning Representation (AMR), a new method
to represent meaning of sentences, defines semantic roles
in a manner different from other SRL systems (Banarescu
et al., 2013). It deals with core and non-core roles in
different specialized ways. AMR annotates core arguments
using the same five core role labels as in PropBank, which
are predicate-specific, and adopts the predicate frame
lexicon extracted from PropBank. But the number of non-
core role labels that are general to all the predicates is up to
40. At the same time, AMR allows to add back dropped
semantic roles in the sentences. Through the dynamic
mechanisms, AMR can provide better solutions to the
above three problems. The English AMR Sembank
has
included 39,260 sentences and become an important
semantic resource.
Referring to the guidelines of English AMR, Li et al.
(2016) has developed annotation specifications for Chinese
AMR (CAMR), taking linguistic characteristics of the
Chinese language into account. CAMR uses the same 5
core role labels (arg0-arg4) and 44 non-core role labels
(time, location, cause, etc., four of which are added based
on the needs of Chinese annotation) as AMR. The predicate
frame lexicon of CAMR is extracted from the corpus (Bai
& Xue, 2016) of Chinese Proposition Bank (CPB) (Xue &
Palmer, 2009). In addition, Li et al. (2017) designs a
framework for aligning the concepts and relations to word