chairwoman, chair, chairperson} expresses a job title ‘‘the
officer who presides at the meetings of an organization’’.
There exist various kinds of semantic relations between
concepts in WordNet, such as hyponym/hypernym (is-a),
meronymy/holonymy (part-of, member-of, substance-of),
and antonyms. The inherited ‘‘is-a’’ relation accounts for
nearly 80 % of all the relationship types. Consequently, we
employ the ‘‘is-a’’ relationship in this work to augment the
semantic information of a given word.
In terms o f semantic property used for similarity com-
putation, WordNet-based measures can roughly fall into
four categories that are distance-based, information con-
tent-based, feature-based, and hybrid. The distance-based
measures [17] evaluate the semantic similarity of concepts
by means of different structure properties, such as path
distance between two concepts based on edge length,
depth, and density. The measures based on information
content (IC) evaluate how specific and informative a con-
cept is from the perspective of informat ion theory [35],
where higher IC value is assigned to the more concrete
concept [18]. In IC-based measures, either the frequency
counts of words in synsets are derived from additional
corpora or the intrinsic hierarchical structure of WordNet is
used to model IC of concepts. Feature-based measures
employ the intrinsic attribute information in WordNet to
construct feature sets or vectors, which involves in synsets,
glosses, and taxonomic relations. Patwardhan et al. pro-
posed two similarity measures based on gloss overlaps [3]
and cosine similarity between gloss vectors [31], respec-
tively. For semantic similarity measurement, Liu et al. [19]
took local densities as the intrinsic properties of concepts
for constructing concept vectors. Hybrid measures [12, 33]
commonly take the advantages of different computing
methods by combining path distance, IC of concepts and
features of concepts.
2.2 Distributed vector representation
In corpus-based measures, lexical vectors are used for
estimating semantic similarity between words. As an
alternative for traditional distributional vector [40], the
distributed vector represe ntation (namely, word embed-
ding) derived from deep learning techniques has signifi-
cantly improved semantic similarity evaluat ion, semantic
disambiguation [15], and analogy relationship reason-
ing [27]. In this distributed vector space, the semantic and
syntactic information [26] as well as morphology [20] are
implicitly encoded into low-dimensional continuous vec-
tors by unsupervised neural network learning.
In terms of neural network models, the promising dis-
tributed representations consist of RNN vector (recurrent
neural network) [24], RNN vector (recursive neural net-
work) [38], context-aware vector [15], log-linear vector
[25], etc. Two log-linear models, i.e., CBOW and Skip-
gram proposed by Mikolov et al. [25], simplify the com-
plexity in training caused by nonlinear hidden layer in
other models. CBOW leverages the sum of continuous bag-
of-words in the context to learn a target word vector rep-
resentation, whi le the training objective of skip-gram is
predicting the representations of the words in context given
a target word. CBOW is relatively faster than Skip-gram
model; however, the latter is more discriminative for rare
words. For semantic disambiguation, Huang et al.
employed both local and global context to generate mul-
tiple prototypes of word embedding when measuring
semantic similarity [15]. Chen et al. leveraged the concept
paraphrases in WordNet to produce multiple sense vectors
for each word [7].
2.3 Semantic fusion on different levels
Structured ontology is considered more effective than
corpus which may encounter the sparseness and imbalance
of semantic information [1]. Hence, a number of studies
focus on incorporating the semantic information from
ontology into the corpus-based measures. The relevant
works conduct semantic fusion from different aspects. We
define their classifications as vector-level [4, 7], metric-
level [1, 2, 6, 42 ], and model-level [10, 41, 43] according
to the increasing granularity of semantic fusion between
corpus and ontology.
Vector-based methods directly fuse the semantic infor-
mation from ontology into corpus through vector operation
or vector extension. Bian et al. extended the original 1-of-v
word vector using the additional features extracted from
WordNet such as concept and part of speech [4]. To
combine the semantic features from WordNet and corpus,
Chen et al. replaced the distributed word vector of a word
with the averaged vector of the words in its gloss whose
cosine similarities with the target word are larger than a
threshold [7].
Metric-based methods mainly combine various
semantic similarity based on unsupervised or supervised
learning. Agirre [1] verified the supervised combination
of multiple methods can produce better result by imple-
menting a 10-fold cross-validation in ranking classifica-
tion task. Alves et al. proposed a regression function
where the lexical similarity, syntactic similarity, semantic
similarity, and distributional similarity are input as fac-
tors [2]. Chaves-Gonza
´
lez and Martı
´
Nez-Gil used evolu-
tionary algorithm to optimize the unsupervised
combination of various WordNet-based similarity met-
rics [6]. Yih and Qazvinian averaged the similarity
results derived from heterogeneous vector space models
on Wikipedia, web search, thesaurus, and WordNet,
respectively [42].
Pers Ubiquit Comput (2016) 20:311–323 313
123