Learning Cross-lingual Word Embeddings via Matrix Co-factorization
Tianze Shi Zhiyuan Liu Yang Liu Maosong Sun
State Key Laboratory of Intelligent Technology and Systems
Tsinghua National Laboratory for Information Science and Technology
Department of Computer Science and Technology
Tsinghua University, Beijing 100084, China
stz11@mails.tsinghua.edu.cn
{liuzy, liuyang2011, sms}@tsinghua.edu.cn
Abstract
A joint-space model for cross-lingual
distributed representations generalizes
language-invariant semantic features.
In this paper, we present a matrix co-
factorization framework for learning
cross-lingual word embeddings. We
explicitly define monolingual training
objectives in the form of matrix de-
composition, and induce cross-lingual
constraints for simultaneously factorizing
monolingual matrices. The cross-lingual
constraints can be derived from parallel
corpora, with or without word alignments.
Empirical results on a task of cross-lingual
document classification show that our
method is effective to encode cross-lingual
knowledge as constraints for cross-lingual
word embeddings.
1 Introduction
Word embeddings allow one to represent words in
a continuous vector space, which characterizes the
lexico-semanic relations among words. In many
NLP tasks, they prove to be high-quality features,
successful applications of which include language
modelling (Bengio et al., 2003), sentiment analy-
sis (Socher et al., 2011) and word sense discrimi-
nation (Huang et al., 2012).
Like words having synonyms in the same lan-
guage, there are also word pairs across lan-
guages which share resembling semantic proper-
ties. Mikolov et al. (2013a) observed a strong
similarity of the geometric arrangements of cor-
responding concepts between the vector spaces of
different languages, and suggested that a cross-
lingual mapping between the two vector spaces is
technically plausible. In the meantime, the joint-
space models for cross-lingual word embeddings
are very desirable, as language-invariant seman-
tic features can be generalized to make it easy to
transfer models across languages. This is espe-
cially important for those low-resource languages,
where it allows one to develop accurate word rep-
resentations of one language by exploiting the
abundant textual resources in another language,
e.g., English, which has a high resource density.
The joint-space models are not only technically
plausible, but also useful for cross-lingual model
transfer. Further, studies have shown that using
cross-lingual correlation can improve the quality
of word representations trained solely with mono-
lingual corpora (Faruqui and Dyer, 2014).
Defining a cross-lingual learning objective is
crucial at the core of the joint-space model. Her-
mann and Blunsom (2014) and Chandar A P et
al. (2014) tried to calculate parallel sentence (or
document) representations and to minimize the
differences between the semantically equivalen-
t pairs. These methods are useful in capturing
semantic information carried by high-level units
(such as phrases and beyond) and usually do not
rely on word alignments. However, they suffer
from reduced accuracy for representing rare to-
kens, whose semantic information may not be well
generalized. In these cases, finer-grained informa-
tion at lexical level, such as aligned word pairs,
dictionaries, and word translation probabilities, is
considered to be helpful.
Ko
ˇ
cisk
`
y et al. (2014) integrated word aligning
process and word embedding in machine transla-
tion models. This method makes full use of paral-
lel corpora and produces high-quality word align-
ments. However, it is unable to exploit the richer
monolingual corpora. On the other hand, Zou et al.
(2013) and Faruqui and Dyer (2014) learnt word
embeddings of different languages in separate s-
paces with monolingual corpora and projected the
embeddings into a joint space, but they can only
capture linear transformation.
In this paper, we address the above challenges
with a framework of matrix co-factorization. We