Unsupervised Word and Dependency Path Embeddings
for Aspect Term Extraction
Yichun Yin
1
, Furu Wei
2
, Li Dong
3
, Kaimeng Xu
1
, Ming Zhang
1⇤
, Ming Zhou
2
1
School of EECS, Peking University
2
Microsoft Research
3
Institute for Language, Cognition and Computation, University of Edinburgh
{yichunyin,1300012834,mzhang cs}@pku.edu.cn,{fuwei,mingzhou}@microsoft.com,li.dong@ed.ac.uk
Abstract
In this paper, we develop a novel approach to as-
pect term extraction based on unsupervised learn-
ing of distributed representations of words and de-
pendency paths. The basic idea is to connect two
words (w
1
and w
2
) with the dependency path (r)
between them in the embedding space. Specifi-
cally, our method optimizes the objective w
1
+ r ⇡
w
2
in the low-dimensional space, where the multi-
hop dependency paths are treated as a sequence
of grammatical relations and modeled by a recur-
rent neural network. Then, we design the embed-
ding features that consider linear context and de-
pendency context information, for the conditional
random field (CRF) based aspect term extraction.
Experimental results on the SemEval datasets show
that, (1) with only embedding features, we can
achieve state-of-the-art results; (2) our embedding
method which incorporates the syntactic informa-
tion among words yields better performance than
other representative ones in aspect term extraction.
1 Introduction
Aspect term extraction
[
Hu and Liu, 2004; Pontiki et al.,
2014; 2015
]
aims to identify the aspect expressions which
refer to the product’s or service’s properties (or attributes),
from the review sentence. It is a fundamental step to ob-
tain the fine-grained sentiment of specific aspects of a prod-
uct, besides the coarse-grained overall sentiment. Until now,
there have been two major approaches for aspect term ex-
traction. The unsupervised (or rule based) methods
[
Qiu et
al., 2011
]
rely on a set of manually defined opinion words as
seeds and rules derived from syntactic parsing trees to itera-
tively extract aspect terms. The supervised methods
[
Jakob
and Gurevych, 2010; Li et al., 2010; Chernyshevich, 2014;
Toh and Wang, 2014; San Vicente et al., 2015
]
usually treat
aspect term extraction as a sequence labeling problem, and
conditional random field (CRF) has been the mainstream
method in the aspect term extraction task of SemEval.
Representation learning has been introduced and achieved
success in natural language processing (NLP)
[
Bengio et al.,
⇤
Corresponding author: Ming Zhang
2013
]
, such as word embeddings
[
Mikolov et al., 2013b
]
and
structured embeddings of knowledge bases
[
Bordes et al.,
2011
]
. It learns distributed representations for text in differ-
ent granularities, such as words, phrases and sentences, and
reduces data sparsity compared with the conventional one-hot
representation. The distributed representations have been re-
ported to be useful in many NLP tasks
[
Turian et al., 2010;
Collobert et al., 2011
]
.
In this paper, we focus on representation learning for as-
pect term extraction under an unsupervised framework. Be-
sides words, dependency paths, which have been shown to be
important clues in aspect term extraction
[
Qiu et al., 2011
]
,
are also taken into consideration. Inspired by the repre-
sentation learning of knowledge bases
[
Bordes et al., 2011;
Neelakantan et al., 2015; Lin et al., 2015
]
that embeds both
entities and relations into a low-dimensional space, we learn
distributed representations of words and dependency paths
from the text corpus. Specifically, the optimization objective
is formalized as w
1
+ r ⇡ w
2
. In the triple (w
1
,w
2
,r), w
1
and w
2
are words, r is the corresponding dependency path
consisting of a sequence of grammatical relations. The re-
current neural network
[
Mikolov et al., 2010
]
is used to learn
the distributed representations of dependency paths. Further-
more, the word embeddings are enhanced by linear context
information in a multi-task learning manner.
The learned embeddings of words and dependency paths
are utilized as features in CRF for aspect term extraction.
The embeddings are real values that are not necessarily in
a bounded range
[
Turian et al., 2010
]
. We therefore firstly
map the continuous embeddings into the discrete embeddings
and make them more appropriate for the CRF model. Then,
we construct the embedding features which include the target
word embedding, linear context embedding and dependency
context embedding for aspect term extraction. We conduct
experiments on the SemEval datasets and obtain comparable
performances with the top systems. To demonstrate the effec-
tiveness of the proposed embedding method, we also compare
our method with other state-of-the-art models. With the same
feature settings, our approach achieves better results. More-
over, we perform a qualitative analysis to show the effective-
ness of the learned word and dependency path embeddings.
The contributions of this paper are two-fold. First, we use
the dependency path to link words in the embedding space
for distributed representation learning of words and depen-
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)