Retrofitting Word Vectors to Semantic Lexicons
Manaal Faruqui Jesse Dodge Sujay K. Jauhar
Chris Dyer Eduard Hovy Noah A. Smith
Language Technologies Institute
Carnegie Mellon University
Pittsburgh, PA, 15213, USA
{mfaruqui,jessed,sjauhar,cdyer,ehovy,nasmith}@cs.cmu.edu
Abstract
Vector space word representations are learned
from distributional information of words in
large corpora. Although such statistics are
semantically informative, they disregard the
valuable information that is contained in se-
mantic lexicons such as WordNet, FrameNet,
and the Paraphrase Database. This paper
proposes a method for refining vector space
representations using relational information
from semantic lexicons by encouraging linked
words to have similar vector representations,
and it makes no assumptions about how the in-
put vectors were constructed. Evaluated on a
battery of standard lexical semantic evaluation
tasks in several languages, we obtain substan-
tial improvements starting with a variety of
word vector models. Our refinement method
outperforms prior techniques for incorporat-
ing semantic lexicons into word vector train-
ing algorithms.
1 Introduction
Data-driven learning of word vectors that capture
lexico-semantic information is a technique of cen-
tral importance in NLP. These word vectors can
in turn be used for identifying semantically related
word pairs (Turney, 2006; Agirre et al., 2009) or
as features in downstream text processing applica-
tions (Turian et al., 2010; Guo et al., 2014). A vari-
ety of approaches for constructing vector space em-
beddings of vocabularies are in use, notably includ-
ing taking low rank approximations of cooccurrence
statistics (Deerwester et al., 1990) and using internal
representations from neural network models of word
sequences (Collobert and Weston, 2008).
Because of their value as lexical semantic repre-
sentations, there has been much research on improv-
ing the quality of vectors. Semantic lexicons, which
provide type-level information about the semantics
of words, typically by identifying synonymy, hyper-
nymy, hyponymy, and paraphrase relations should
be a valuable resource for improving the quality of
word vectors that are trained solely on unlabeled
corpora. Examples of such resources include Word-
Net (Miller, 1995), FrameNet (Baker et al., 1998)
and the Paraphrase Database (Ganitkevitch et al.,
2013).
Recent work has shown that by either changing
the objective of the word vector training algorithm
in neural language models (Yu and Dredze, 2014;
Xu et al., 2014; Bian et al., 2014; Fried and Duh,
2014) or by relation-specific augmentation of the
cooccurence matrix in spectral word vector models
to incorporate semantic knowledge (Yih et al., 2012;
Chang et al., 2013), the quality of word vectors can
be improved. However, these methods are limited to
particular methods for constructing vectors.
The contribution of this paper is a graph-based
learning technique for using lexical relational re-
sources to obtain higher quality semantic vectors,
which we call “retrofitting.” In contrast to previ-
ous work, retrofitting is applied as a post-processing
step by running belief propagation on a graph con-
structed from lexicon-derived relational information
to update word vectors (§2). This allows retrofitting
to be used on pre-trained word vectors obtained
using any vector training model. Intuitively, our
method encourages the new vectors to be (i) simi-
lar to the vectors of related word types and (ii) simi-
lar to their purely distributional representations. The
retrofitting process is fast, taking about 5 seconds for
a graph of 100,000 words and vector length 300, and
its runtime is independent of the original word vec-
tor training model.