Dynamic Topic Detection Model by Fusing Sentiment Polarity
Xi Ding
1
, Lanshan Zhang
2
, Ye Tian
1
, Xiangyang Gong
1
and Wendong Wang
1
1
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications
Beijing, 100876, China
2
School of Digital Media and Design Arts, Beijing University of Posts and Telecommunications
Beijing, 100876, China
Email:dingxi515@163.com,zls326@sina.com,{yetian,xygong,wdwang}@bupt.edu.cn
Abstract
Traditional static topic models mainly focus on the
statistical correlation between words, but ignore the
sentiment tendency and the temporal properties which may
have great effects on topic detection results. This paper
proposed an LDA-based dynamic sentiment-topic (DST)
model, which could not only detect and track topics but
could also analyse the shift of general’s sentiment
tendency towards certain topic. This model combines the
data with the sentiment and dynamic properties of time by
maximum likelihood estimation and the sliding window.
We use Gibbs sampling method to estimate and update
model parameters, and use random EM algorithm for
model reasoning. Experiments on real dataset demonstrate
that DST model outperforms the existing algorithms.
.
Keywords: dynamic sentiment topic model, sentiment
analysis, Gibbs sampling.
1 Introduction
With the rapid development of Web 2.0 technology,
social media, represented by Facebook, Twitter, Blog and
Weibo, has really taken off in the last few years. More and
more people choose social media as main information
exchange platform to publish and access real-time
information. In addition to daily communication functions,
the huge amount of user generated content (UGC) often
also carries people’s attitudes towards certain public
events or products/services. Existing researches show that
compared with traditional media channels, social media
data which conveys public sentiment has more important
social and economic value for community management
agency, enterprise and the general public. Affected by this,
the sentiment analysis and topic detection of unstructured
social media data has emerged as a hot research hotspot
among the various tasks of social networking analysis
area.
Researchers who are concentrating on the area of
sentiment analysis mainly focus on studying the sentiment
polarity (positive, neutral, negative) classification methods
of the social media data. Although a host of research
achievements have been disclosed [Pang, B. and Lee. L][
Aue, A. and Gamon, M][Turney, P, D], they are mostly
based on supervised learning methods, and there still
existed two limitations. First, a lot of manually annotated
Copyright © 2015, Australian Computer Society, Inc. This paper
appeared at the Thirty-Eighth Australasian Computer Science
Conference, ACSC 2015, Sydney, Australia, January 2015.
Conferences in Research and Practice in Information Technology
(CRPIT), Vol. 159. David Parry, Ed. Reproduction for academic,
not-for-profit purposes permitted provided this text is included.
samples are needed for parameters adjustment process.
Second, sentiment classifier trained from certain topic area
often does not apply to other topic areas since the
sentiment distribution and topic content are closely
related. Besides, an unavoidable problem is that the
sentiment word dictionary that was trained from
traditional corpus could not be applied to social data that
are flexibly expressed with much more emotion icons and
disjunctive questions used, considering the huge
difference of language style between social media and
traditional media. Topic detection derives from TDT
(Topic Detection and Tracking) technology, which mainly
focuses on detecting and organizing unknown topics from
traditional formal expressed text stream. Topic detection
task consists of two branches: historical topic detection
and online topic detection. For the former, its objective is
to dig out the hidden topics from a given corpus with
unsupervised clustering means. Each cluster corresponds
to a certain topic. While online topic detection determines
whether newly arrived text stream belongs to an existing
topic or a new topic according to historic information.
Compared with newspaper, periodicals and academic
report, social media are most non-standard expressed
unstructured short texts in real-time formalism. These
features make the task of topic detection which takes
social media as study object more challenging.
Sentiment analysis and topic detection are taken as two
independent research tasks in current social networking
analysis fields. However, “sentiment” and “topic” are two
highly associated concepts. On one hand, the generation
and spread process of sentiment must rely on a certain
body, i.e., a specified topic. On the other hand, the change
of sentiment would react on its carrier, i.e., certain
specified topic, and consequently affect the evolution of
topic. Take the event “the loss of communication of flight
M370” as example. At the beginning stage, anxiety and
trepidation are circulating widely among the people who
express concern about this event. As time goes on, the
mood gradually evolves into sadness and discontent.
Under the influence of this mood, M370 event evolves into
a new stage of “relatives’ doubt about Malaysian Airline’s
emergency treatment” from “the lost of aircraft” and “the
search and rescue” .Thus it could be seen, the probability
distribution of topic is affected by sentiment, and these two
concepts are highly relevant.
By fusing the polarity of sentiment, a dynamic
sentiment-topic (DST) model is proposed based on Latent
Dirichlet Allocation (LDA) in this paper. LDA model
assumes that, a document is composed of various topics
with different probabilistic combination, within which
each topic itself is also a probability distribution of a series
of words. For a given corpus, LDA detects all hidden