Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Short Papers), pages 138–143
Melbourne, Australia, July 15 - 20, 2018.
c
2018 Association for Computational Linguistics
138
Analogical Reasoning on Chinese Morphological and Semantic Relations
Shen Li
1,2,
♠
Zhe Zhao
3,
♣
Renfen Hu
1,2,
♠
,
†
Wensi Li
1,2,
Tao Liu
3,
♣
Xiaoyong Du
3,
♣
♠
{shen, irishere}@mail.bnu.edu.cn
♣
{helloworld, tliu, duyong}@ruc.edu.cn
zjklws@163.com
1
Institute of Chinese Information Processing, Beijing Normal University
2
UltraPower-BNU Joint Laboratory for Artificial Intelligence, Beijing Normal University
3
School of Information, Renmin University of China
Abstract
Analogical reasoning is effective in cap-
turing linguistic regularities. This paper
proposes an analogical reasoning task on
Chinese. After delving into Chinese lexi-
cal knowledge, we sketch 68 implicit mor-
phological relations and 28 explicit se-
mantic relations. A big and balanced
dataset CA8 is then built for this task,
including 17813 questions. Furthermore,
we systematically explore the influences
of vector representations, context features,
and corpora on analogical reasoning. With
the experiments, CA8 is proved to be a re-
liable benchmark for evaluating Chinese
word embeddings.
1 Introduction
Recently, the boom of word embedding draws our
attention to analogical reasoning on linguistic reg-
ularities. Given the word representations, anal-
ogy questions can be automatically solved via vec-
tor computation, e.g. “apples - apple + car ≈
cars” for morphological regularities and “king -
man + woman ≈ queen” for semantic regularities
(Mikolov et al., 2013). Analogical reasoning has
become a reliable evaluation method for word em-
beddings. In addition, It can be used in inducing
morphological transformations (Soricut and Och,
2015), detecting semantic relations (Herdagdelen
and Baroni, 2009), and translating unknown words
(Langlais and Patry, 2007).
It is well known that linguistic regularities vary
a lot among different languages. For example,
Chinese is a typical analytic language which lacks
inflection. Figure 1 shows that function words and
reduplication are used to denote grammatical and
semantic information. In addition, many semantic
†
Corresponding author.
rén$
人
rén$rén$
人人
person
every$person
+ān$
天
+ān$+ān$
天天
day every$day
(a) (b)
easier
gèng$
更
jiǎn$dān$
简单
easy
jiǎn$dān$
简单
xiē$
些
easiest
zuì$
最
jiǎn$dān$
简单
jiǎn$dān$
简单
Figure 1: Examples of Chinese lexical knowledge:
(a) function words (in orange boxes) are used to
indicate the comparative and superlative degrees;
(b) reduplication yields the meaning of “every”.
relations are closely related with social and cul-
tural factors, e.g. in Chinese “sh
¯
ı-xi
¯
an” (god of
poetry) refers to the poet Li-bai and “sh
¯
ı-shèng”
(saint of poetry) refers to the poet Du-fu.
However, few attempts have been made in
Chinese analogical reasoning. The only Chi-
nese analogy dataset is translated from part of
an English dataset (Chen et al., 2015) (denote as
CA_translated). Although it has been widely used
in evaluation of word embeddings (Yang and Sun,
2015; Yin et al., 2016; Su and Lee, 2017), it could
not serve as a reliable benchmark since it includes
only 134 unique Chinese words in three semantic
relations (capital, state, and family), and morpho-
logical knowledge is not even considered.
Therefore, we would like to investigate linguis-
tic regularities beneath Chinese. By modeling
them as an analogical reasoning task, we could
further examine the effects of vector offset meth-
ods in detecting Chinese morphological and se-
mantic relations. As far as we know, this is the first
study focusing on Chinese analogical reasoning.
Moreover, we release a standard benchmark for
evaluation of Chinese word embedding, together
with 36 open-source pre-trained embeddings at