Research on Sentiment Tendency Analysis of Microtext Based on Sense Group
Bin Gui
School of Information
Remin University of China
Beijing,China
guibin_163@163.com
Xiaoping Yang
School of Information
Remin University of China
City, Country
yang@ruc.edu.cn
Abstract—With the development of internet in China, Mic-
croblogging provides a new platform for communicating and
sharing information among Web users. Users can express
opinions and record daily life using microblogs.Microblogs
that are posted by users indicate their interests to some
extent.But it seems very hard to analyze the sentiment hided in
Chinese Microtext because of its complexity.This paper
proposes a new way to determine the sentiment tendency of
Chinese microtext based on the partitioned Sense
Group(STDSG).When to judge the sentiment tendency of
Microtext,we first partition it into separate sense group,and
then determine it’s sentiment tendency based on Emotional
Dictionary.And We aslo consider various factors which contain
negations, degree adverbs and punctuations. The effectiveness
of STDSG is strongly supported by the results of our
experiments.
Keywords-sense group,Sina Weibo,sentiment tendency,
degree word, negation word
I. INTRODUCTION
In the past few years, there has been a huge growth in the
use of microblogging plat-forms such as Sina Weibo, Twitter.
On a microblogging website, users are able to post short
messages of a certain length, e.g., 140 English or Chinese
characters, to communicate and share information with each
other [1].Web users usually use microb-logs to express
opinions and record daily life. Therefore, the messages
posted by mi-croblog users, to some extent, indicate their
sentiment.
Sentiment or Opinion Mining has been an hot area of
research in academics be-cause of the challenges that it poses.
It is also a vital question that is sought in the industry as it
gives an insight into the consumers' mind, and his decision
making process besides being an explicit feedback about the
performance of any widely used and talked about product,
service, even or a phenomenon. For government,automated
sentiment analysis of microblog posts is of interest to many,
allowing monitoring of public sentiment towards people,
events, as they happen.
While there has been a fair amount of research on how
sentiments are expressed in genres such as online reviews
and news articles, how sentiments are expressed given the
informal language and message-length constraints of
microblogging has been much less studied. Features such as
automatic part-of-speech tags and resources such as
sentiment lexicons have proved useful for sentiment analysis
in Twitter, but will they also prove useful for sentiment
analysis in Chinese microblogging?how to im-prove the
accuracy of sentiment analysis?We will examine these issues.
The rest of the paper is organized as follows. Related
work of sentiment analysis and sentiment analysis in
microblogging are discussed in section 2. Section 3 de-
scribes the algorithm of dividing the sentence into sense
groups.Section 4 illustrates-sentiment tendency analysis on
chinese microblogging. Section 5 describes experi-mental
results. Section 6 concludes.
II. RELATED WORK
Sentiment analysis is one of the hottest topics in data
mining and natural language processing. It also called
Opinion Mining, Opinion Analysis, Sentiment Classification
or Subjectivity Analysis, focuses on how to recognize,
categorize, label and extract the sentiments and viewpoints
hidden in subjective texts[2]. The research of sentiment
analysis fall into three levels: word-level, sentence-level and
passage-level, among which word-level analysis is the
foundation of sentence-level and passage-level. Tur-ney[3]
quantized words’ tendency as a real number measurement,
which is forwardly used to classify the tendency of the whole
passage into “compliment” and “critics” by the way of
machine learning. Hatzivassiloglou[4] attained this goal by
the semantic relation between words. Kamps et al.[5] also
made it with the help of word similarity provided by
WordNet, but with two defects that only adjectives and
synonyms are considered. Du Weifu[6] presented an
extensible tendency calculation framework, regarding the
problem of tendency calculation as optimization problem.
Meena et al.[7] analyzed the sentiment tendency considering
not only single words but also the sentence structures,
grammars, and other semantics information. A hybrid
approach that integrate heuristic rule and Bayesian
classification was adopted by Wang et al. taking adjectives
and adverbs as feature words[8]. Wang Gen et al.[9] applied
condi-tional random field(CRF) into sentence sentiment
analysis, and presented an approach based on redundancy
labeling, while Yang Chao et al.[10] took adverbs in the
sen-tence into consideration to mine the sentiment tendency
in internet comments. Ma-chine learning was brought into
passage-level analysis for the first time by Pang, who made a
comparison between three classification models---NB, ME
and SVM taking n-gram word features into consideration,
and it finally came into the conclusion that unigram feature
appears to have the best effect[11]. However, Cu i’s[12]
experiment showed that unigram only acts well when
performing on small-scale training corpus and it was n-gram