3 Construction of Emotion Cause Corpus
In this section, we first describe the linguistic phe-
nomenon in emotion expressions. It serves as the in-
spiration to develop the annotated dataset. We then
introduce details of the annotation scheme, followed
by the construction of the dataset.
3.1 Linguistic Phenomenon of Emotion Causes
Emotion causes play an important role in emotion
expressions. An emotion cause reveals the stimulus
of an emotion. Considering linguistic phenomenon
of emotion causes, we follow three basic principles
in corpus construction: (1) Keep the whole context
of emotion expression; (2) The basic processing unit
is at the clause level; and (3) Use of formal text.
In written text, there is an emotion keyword,
which is used to express an emotion, in the context
of the emotion cause. Thus, finding the appropri-
ate context of emotion keywords in the annotation
is the pre-requisite to identify its cause. It is the
reason why we keep the whole context of emotion
keywords.
Another important kind of cues is the presence of
conjunctions and prepositions. These words indi-
cate the discourse information between clauses. In
order to make use of discourse information, the ba-
sic analysis unit should be at clause level rather than
at sentence level.
In the third principle, we choose the formal tex-
t in corpus construction. According to the related
works, emotion expressions can have overlapping e-
motion cause and emotion target (Gui et al., 2014)
in informal text. This is why some studies even in-
corporate cause extraction with target identification
to improve performance. However, our focus is on
emotion cause identification. We use formal news
text to avoid the potential mix up.
3.2 Collection and Annotation
We first take 3 years (2013-15) Chinese city news
from NEWS SINA
2
containing 20,000 articles as the
raw corpus. Based on a list of 10,259 Chinese pri-
mary emotion keywords (keywords for short) (Xu
et al., 2008), we extract 15,687 instances by key-
word matching from the raw data. Here, we call the
presence of an emotion keyword as an instance in
2
http://news.sina.com.cn/society/
the corpus. For each matched keyword, we extract
three preceding clauses and three following clauses
as the context of an instance. If a sentence has more
than 3 clauses in each direction, the context will in-
clude the rest of the sentence to make the context
complete. For simplicity, we omit cross paragraph
context.
Note that the presence of keywords does not nec-
essarily convey emotional information due to differ-
ent possible reasons such as negative polarity and
sense ambiguity. For example, “祝 愿/wishes” is
an emotion word of “happiness”. It can also be
the name of a song. Also, the presence of emotion
keywords does not necessarily guarantee the exis-
tence of emotional cause neither. After removing
those irrelevant instances, there are 2,105 instances
remain. For each emotional instance, two annotators
manually annotate the emotion categories and the
cause(es) in the W3C Emotion Markup Language
(EML) format. Ex1 shows an example of an anno-
tated emotional sentence in the corpus, presented by
the original simplified Chinese, followed by its En-
glish translation. To save space, we remove the xml
tags in the annotation. The original annotated data
is in a subsidiary file
3
. The basic analysis unit is a
clause. Emotion cause is marked by <cause>, and
the emotion keyword is marked by <keywords>. E-
motion type, POS, position and the length of anno-
tation are also annotated in Emotionml format.
Ex.1: 朱 某 今 年55岁 ,1979年 参 加 工 作 时
才19岁,已有36年的手艺。 “ 我当时被分到丹阳南
京理发店工作,这是当时丹阳最大的理发店。 我在
那儿获得了好多证书和荣誉。 ”<cause POS=“v”
Dis=“-1”>说 起 自 己 的 荣 誉</cause>, 朱 某 很
是<keywords type=happiness>自豪</keywords>。
Mr. Zhu is 55 years old. He started working
in 1979 as a barber when he was 19 , and has 36
years of experience. “I was assigned to work at the
Barbershop in Danyang, Nanjing. It is the largest
barbershop in Danyang. I won many awards and
honors there.” <cause POS=“v” Dis=“-1”>Talking
about his honors</cause>, Mr. Zhu is so <keywords
type=“happiness”> proud </keywords>.
Ex.1 only contains one cause. However, one key-
word may have more than one corresponding emo-
tion causes. In Ex.2, there are two relevant causes
3
http://hlt.hitsz.edu.cn/?page id=694