Supervised word sense disambiguation using semantic diffusion kernel
Tinghua Wang
a,b,
n
, Junyang Rao
b
,QiHu
c
a
School of Mathematics and Computer Science, Gannan Normal University, Ganzhou 341000, China
b
Institute of Computer Science and Technology, Peking University, Beijing 100871, China
c
School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
article info
Article history:
Received 10 March 2013
Received in revised form
20 July 2013
Accepted 19 August 2013
Available online 17 September 2013
Keywords:
Word sense disambiguation (WSD)
Semantic diffusion kernel
Support vector machine (SVM)
Kernel method
Natural language processing
abstract
The success of machine learning approaches to word sense disambiguation (WSD) is largely dependent
on the representation of the context in which an ambiguous word occurs. Typically, the contexts are
represented as the vector space using “Bag of Words (BoW)” technique. Despite its ease of use, BoW
representation suffers from well-known limitations, mostly due to its inability to exploit semantic
similarity between terms. In this paper, we apply the semantic diffusion kernel, which models semantic
similarity by means of a diffusion process on a graph defined by lexicon and co-occurrence information,
to smooth the BoW representation for WSD systems. Semantic diffusion kernel can be obtained through
a matrix exponentiation transformation on the given kernel matrix, and virtually exploits higher order
co-occurrences to infer semantic similarity between terms. The superiority of the proposed method is
demonstrated experimentally with several SensEval disambiguation tasks.
& 2013 Elsevier Ltd. All rights reserved.
1. Introduction
Word sense disambiguation (WSD) refers to the task of identi-
fying the correct sense of an ambiguous word in a given context
(Navigli, 2009). The ambiguity results from homonymy, i.e., words
having the same spelling and pronunciation but different senses,
and polysemy, i.e., words having multiple senses, usually with
subtle differences (Nguyen and Ock, 2011). Homonymy is relatively
easy to disambiguate because the domains of different senses are
distinct, e.g., the noun bank could be defined as “sloping raised
land, especially along the sides of a river” or alternately as “an
organization where people and businesses can invest or borrow
money, convert to foreign money, etc. or a building where these
services are offered” (Cambridge Advanced Learner's Dictionary).
Polysemy is far more difficult because of the subtle differences and
the common origin of the senses, e.g., the noun cold could refer to
“a mild viral infection involving the nose and respiratory passages” or
“the absence of heat, or the sensation produced by low temperatures”
(WordNet 3.1). As a fundamental semantic understanding task at
the lexical level in natural language processing, WSD can benefit
many applications such as information retrieval (Stokoe et al.,
2003; Zhong and Ng, 2012) and machine translation (Carpuat and
Wu, 2007; Chan et al., 2007). In actual applications, WSD is often
fully integrated into the system and cannot be separated out
(for instance, in information retrieval, WSD is often not done
explicitly but is just by-product of query to document matching).
However, it has been very difficult to formalize the process of
disambiguation, which humans can do so effortlessly.
There are two main kinds of methods to perform the task of WSD:
knowledg e-based approaches and corpus-based approaches.
The former disambiguate words by comparing their conte xt against
information from the predefinedlexicalresourcessuchasWordNet,
whereas the latter do no t make use of any the se resources for
disambiguation (Navigli, 2009). Most of the corpus-based approaches
stem from the machine learning community , ranging from supervised
learning in which a classifier is trained for each distinct word on a
corpus of manually sense-annotated examples, to completely unsu-
pervised methods that cluster occurrence of words, thereb y inducing
senses. Among these, supervised learning approaches have been the
most successful algorithms to date. Moreove r , in recent years it seems
very promising that applying kernel methods (Shawe- Taylor and
Cristianini, 2004; Simek et al., 2004) such as support vector machine
(SVM) (Giuliano et al., 2009; Jin et al., 2008;
Joshi et al., 2006; Lee
et al., 2004; Pahikkala et al., 2009), kernel principal component
analysis (KPCA) (Su et al., 2004; Wu et al., 2004)andregularized
least-squares classifier (RLSC) (Popescu, 2004)totheWSDtask.Kernel
methods in general and SVM in particular have delivered extremely
high performance in a wide variety of learning tasks. The advantage of
using kernel methods for WSD is that they offer a flexible and efficient
way of defining application-specific kernels for introducing back-
ground know ledge and modeling e xplicitly linguistic insights.
For the machine learning-based WSD, one of the key steps is
the representation of the context in which an ambiguous word
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/engappai
Engineering Applications of Artificial Intelligence
0952-1976/$ - see front matter & 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.engappai.2013.08.007
n
Corresponding author at: School of Mathematics and Computer Science,
Gannan Normal University, Ganzhou 341000, China. Tel.: þ 86 18810358076.
E-mail addresses: wthpku@163.com, wthgnnu@163.com (T. Wang).
Engineering Applications of Artificial Intelligence 27 (2014) 167–174