Learning to Attend via Word-Aspect Associative Fusion
for Aspect-based Sentiment Analysis
Yi Tay
∗ 1
, Luu Anh Tuan
∗ 2
and Siu Cheung Hui
3
1, 3
Nanyang Technological University
School of Computer Science and Engineering, Singapore
2
Institute for Infocomm Research, Singapore
Abstract
Aspect-based sentiment analysis (ABSA) tries to predict the
polarity of a given document with respect to a given aspect
entity. While neural network architectures have been suc-
cessful in predicting the overall polarity of sentences, aspect-
specific sentiment analysis still remains as an open problem.
In this paper, we propose a novel method for integrating as-
pect information into the neural model. More specifically, we
incorporate aspect information into the neural model by mod-
eling word-aspect relationships. Our novel model, Aspect Fu-
sion LSTM (AF-LSTM) learns to attend based on associa-
tive relationships between sentence words and aspect which
allows our model to adaptively focus on the correct words
given an aspect term. This ameliorates the flaws of other
state-of-the-art models that utilize naive concatenations to
model word-aspect similarity. Instead, our model adopts cir-
cular convolution and circular correlation to model the simi-
larity between aspect and words and elegantly incorporates
this within a differentiable neural attention framework. Fi-
nally, our model is end-to-end differentiable and highly re-
lated to convolution-correlation (holographic like) memories.
Our proposed neural model achieves state-of-the-art perfor-
mance on benchmark datasets, outperforming ATAE-LSTM
by 4% − 5% on average across multiple datasets.
Introduction
Sentiment analysis lives at the heart of many business and
social applications which explains its wild popularity in
NLP research. Aspect-based sentiment analysis (ABSA)
goes deeper by trying to predict polarity with respect to a
specific aspect term. For example, consider the following re-
view, ‘I love the user interface but this app is practically use-
less!’. Clearly, we observe that there are two aspects (user
interface and functionality) with completely opposite polar-
ities. As such, techniques that are able to incorporate aspect
for making predictions are not only highly desirable but also
significantly more realistic compared to coarse-grained sen-
timent analysis. Recently, end-to-end neural networks (or
deep learning) (Wang et al. 2016; Li, Guo, and Mei 2017)
such as the long short-term memory networks (Hochreiter
and Schmidhuber 1997) and memory networks (Sukhbaatar
et al. 2015) have demonstrated promising performance on
∗
Denotes equal contribution
Copyright
c
2018, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
ABSA tasks without requiring any laborious feature engi-
neering.
The task of ABSA introduces a challenging problem of
incorporating aspect information into neural architectures.
As such, deep learning architectures that are able to ele-
gantly incorporate aspect information together with sentence
modeling are highly desirable. Recently, there have been a
myriad of models proposed for this purpose. For example,
ATAE-LSTM (Wang et al. 2016) is a recently incepted at-
tention based model that learns to attend to different parts
of the sentence given the aspect information. ATAE-LSTM
tries to incorporate aspect information by adopting a simple
concatenation of context words and aspect. This is done both
at the attention layer and the sentence modeling layer (inputs
to the LSTM). Consequently, the ATAE-LSTM model suf-
fers from the following drawbacks:
• Instead of allowing the attention layer to focus on learn-
ing the relative importance of context words, the attention
layer is given the extra burden of modeling the relation-
ship between aspect and context words.
• The parameters of LSTM are now given an extra burden
aside from modeling sequential information, i.e., it has to
also learn relationships between aspect and words. The
LSTM layer in ATAE-LSTM is being trained on a se-
quence that is dominated by the aspect embedding. As
such, this would make the model significantly harder to
train.
• Naive concatenation doubles the input to the LSTM layer
in ATAE-LSTM which incurs additional parameter costs
to the LSTM layer. This has implications in terms of
memory footprint, computational complexity and risk of
overfitting.
In summary, the important question here is whether the
naive concatenation of aspect and words at both the LSTM
layer and attention layer is necessary or even desirable. In
fact, our early empirical experiments showed that the ATAE-
LSTM does not always outperform the baseline LSTM
model. We believe that this is caused by the word-aspect
concatenation making the model difficult to train. As such,
this paper aims to tackle the weaknesses of ATAE-LSTM
while maintaining the advantages of aspect-aware atten-
tions. Our model cleverly separates the responsibilities of
layers by incorporating a dedicated association layer for
arXiv:1712.05403v1 [cs.CL] 14 Dec 2017