A Joint Model for Sentiment-Aware Topic Detection on
Social Media
Kang Xu and Guilin Qi and Junheng Huang and Tianxing Wu
1
Abstract. Joint sentiment/topic models are widely applied in
detecting sentiment-aware topics on the lengthy review data and
they are achieved with Latent Dirichlet Allocation (LDA) based
model. Nowadays plenty of user-generated posts, e.g., tweets and
E-commerce short reviews, are published on the social media and
the posts imply the public’s sentiments (i.e., positive and negative)
towards various topics. However, the existing sentiment/topic mod-
els are not applicable to detect sentiment-aware topics on the posts,
i.e., short texts, because applying the models to the short texts di-
rectly will suffer from the context sparsity problem. In this paper,
we propose a Time-User Sentiment/Topic Latent Dirichlet Alloca-
tion (TUS-LDA) which aggregates posts in the same timeslice or
user as a pseudo-document to alleviate the context sparsity prob-
lem. Moreover, we design approaches for parameter inference and
incorporating prior knowledge into TUS-LDA. Experiments on the
Sentiment140 and tweets of electronic products from Twitter7 show
that TUS-LDA outperforms previous models in the tasks of senti-
ment classification and sentiment-aware topic extraction. Finally, we
visualize the sentiment-aware topics discovered by TUS-LDA.
1 Introduction
With the rapid growth of Web 2.0, a mass of user-generated posts,
e.g., tweets and E-commerce short reviews, which capture people’s
interests, thoughts, sentiments and actions. The posts have been accu-
mulating on the social media with each passing day. Sentiment anal-
ysis attempts to find user preference, likes and dislikes from the posts
on social media, such as reviews, blogs and microblogs [21] and topic
modeling attempts to discover the topics or aspects from from re-
views, blogs and microblogs etc [3]. Topic modeling and sentiment
analysis on the posts are two significant tasks which can benefit many
people. For example, we can discover a topic about “Apple Inc.” and
the overall sentiment of the topic. The sentiment of the topic about
“Apple Inc.” is implicitly associated with the stock trading of “Apple
Inc.”, because negative sentiments towards the company on social
media can fall sales and financial gains but positive sentiments can
improve sales [2]. Topic modeling [1] focuses on extracting word-
level or document-level topics, while sentiment analysis [23] is to
analyze the sentiments of words or documents.
Topic modeling and sentiment analysis on the social media are
complementary where sentiments on the social media often change
over different topics and topics on the social media are always re-
lated to public sentiments. So jointly modeling topics and sentiments
on the social media is a feasible and significative task and it can re-
flect people’s sentiment on different topics. However, unlike the nor-
1
Southeast University, Nanjing, China
Email: {kxu,gqi,jhhuang,wutianxing}@seu.edu.cn
mal documents (e.g., news and long reviews), the short and informal
characteristic of the posts, e.g., tweets and short reviews, on the so-
cial media makes the tasks of topic modeling and sentiment analysis
more challenging.
By jointly modeling topics and sentiments on social media, we
want to obtain sentiment-aware topics from the posts, e.g., a topic
about “Apple Inc.” (‘ipad’, ‘iphone’, ‘itouch’, ‘imac’, ‘beautiful’ and
‘popular’) with the overall sentiment polarity “positive”. Topic mod-
els, e.g., LDA [1] and pLSA [10], originally focus on mining top-
ics from texts, but the models can also be extended to extract an
extra aspect of texts, i.e., sentiment. Conventional sentiment-aware
topic models, like Joint Sentiment/Topic Model (JST) [15] and As-
pect/Sentiment Unification Model (ASUM) [11], are utilized for un-
covering the hidden topics and sentiments from text corpus where
each document is a mixture of sentiment/topics and each senti-
ment/topic is a mixture of words. Thereinto, each sentiment label
in the models is viewed as a special kind of topic where topics are
unknown and data-driven but sentiments are known and specified.
However, for the short and informal characteristic of the posts, ap-
plying the models to the short posts on the social media directly al-
ways suffers from the context sparsity problem. So the models fail to
recognize the accurate sentiments and senses of words in the posts.
One simple and effective way to alleviate the sparsity problem is
to aggregate short posts into lengthy pseudo-documents [5, 31]. Here
we assume that the posts on the social media are a mixture of two
kinds of topics: temporal topics which are related to current events
(e.g., tweets about a topic “Announcement of iphone SE” in Fig 1(a)
which are produced in a timeslice) and stable topics which are related
to personal interests (e.g., tweets about a topic “Apple products” in
Fig 1(b) which are produced by a user). Thereinto, temporal topics
are sensitive to time. If posts belong to temporal topics, we aggregate
the posts in the same timeslice as a single document. We assume each
timeslice is a mixture of sentiment-aware topics, i.e., each sentiment
in the timeslice corresponds to several topics. Similar to temporal
topics, stable topics are related to specific users and each user is a
mixture of sentiment-aware topics. If a post belongs to a temporal
topic, the post is assigned to a sentiment-aware topic in its publishing
timeslice; otherwise, it is assigned to a sentiment-aware topic in its
publishing user.
Moreover, based on the analysis of the characteristics of topics and
sentiments, we exploit the important observation of topics: A single
post always talks about a single topic [31]. Although a post usually
talks about a single topic, a post may talk about multiple aspects of
the topic with different sentiment polarities [12, 18].
For example, while the following short review of cannon cam-
era from Amazon.com expresses the overall sentiment polarity of
Camera, which corresponds to the part in italics, as positive, it addi-
ECAI 2016
G.A. Kaminka et al. (Eds.)
© 2016 The Authors and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-672-9-338
338