Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2503–2508,
Lisbon, Portugal, 17-21 September 2015.
c
2015 Association for Computational Linguistics.
JEAM: A Novel Model for Cross-Domain Sentiment Classification
Based on Emotion Analysis
Kun-Hu Luo, Zhi-Hong Deng
, Liang-Chen Wei, Hongliang Yu
School of Electronic Engineering and Computer Science
Peking University, Beijing, China
{dr.tiger126@gmail.com, zhdeng@cis.pku.edu.cn, pkuhaywire@gmail.com,
yuhongliang324@gmail.com}
Abstract
Cross-domain sentiment classification
(CSC) aims at learning a sentiment
classifier for unlabeled data in the target
domain based on the labeled data from a
different source domain. Due to the
differences of data distribution of two
domains in terms of the raw features, the
CSC problem is difficult and challenging.
Previous researches mainly focused on
concepts mining by clustering words
across data domains, which ignored the
importance of authors’ emotion contained
in data, or the different representations of
the emotion between domains. In this
paper, we propose a novel framework to
solve the CSC problem, by modelling the
emotion across domains. We first develop
a probabilistic model named JEAM to
model author’s emotion state when
writing. Then, an EM algorithm is
introduced to solve the likelihood
maximum problem and to obtain the latent
emotion distribution of the author. Finally,
a supervised learning method is utilized to
assign the sentiment polarity to a given
online review. Experiments show that our
approach is effective and outperforms
state-of-the-art approaches.
1 Introduction
Cross-domain sentiment classification (CSC) is
the task that learns a sentiment classifier for
unlabeled data in the target domain based on the
labeled data from the source domain. With the
increasing amount of opinion information
Corresponding author
available on the Internet, CSC has become a hot
spot in recent years. Traditional machine learning
algorithms often train a classifier utilizing the
labeled data for CSC. However, in some practical
cases, we may have many labeled data for some
domains (source domains) but very few or no
labeled data for other domains (target domains).
Due to the differences of the distribution of two
domains in terms of raw features, e.g. raw term
frequency, the classifier trained from the source
domain often performs badly on the target domain.
To overcome this issue, several feature-based
studies have been proposed to improve the
sentiment classification domain adaptation
[Zhuang et al., 2013; He et al., 2011; Gao and Li,
2011; Li et al., 2012; Dai et al., 2007; Zhuang et
al., 2010; Pan et al., 2010; Wang et al., 2011; Long
et al., 2012; Lin and He, 2009].
Existing studies build various generative
models to solve the domain adaptation problems
for CSC. In most cases, the models are trained by
using the whole corpora without specifying on the
sentiment of the texts. For example, [Zhuang et al.,
2013] propose a general framework HIDC to mine
high-level concepts (e.g. word clusters) across
various domains. However, their learned concepts
contain many topics not restricted to the sentiment.
On the other hand, some researchers focus on the
usage of the sentiment in CSC study [Mitra et al.,
2013a; Mitra et al., 2013b; He et al., 2011]. [He et
al., 2011] modify JST model [Lin and He, 2009]
by incorporating word polarity priors through
adjusting the topic-word Dirichlet priors.
However, they fail to consider the expression
differences among various domains.
To overcome the above issues, we employ
“emotion”, for its ubiquity among domains. The
sentiment words in different domains might vary
2503