Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing (SIGHAN-8), pages 158–163,
Beijing, China, July 30-31, 2015.
c
2015 Association for Computational Linguistics and Asian Federation of Natural Language Processing
Topic-Based Chinese Message Polarity Classification
System at SIGHAN8-Task2
Chun Liao, Chong Feng, Sen Yang, Heyan Huang
School of Computer Science
and Technology, Beijing
Institute of Technology
{cliao, fengchong, syang, hhy63}@bit.edu.cn
Abstract
This paper describes the topic-based Chi-
nese message polarity classification sys-
tem submitted by LCYS_TEAM at
SIGHAN8-Task2. The system mainly in-
cludes two parts: 1) a graph-based rank-
ing model integrating local and global in-
formation is adopted to represent the
classification ability of words towards
different topics. In construction of graph
model, a new weighting approach and a
PMI-based random jumping probability
selection method is proposed. 2) For sen-
timental features, word embedding is
employed for acquiring expanded topical
words and syntactic dependency is
adopted for getting topic-related senti-
mental words. Experiment results
demonstrate the effectiveness of our sys-
tem.
1 Introduction
Sentiment analysis, which is to identify or de-
termine the implied emotional orientation, atti-
tude and opinion when people express something,
is becoming more and more important for net-
work monitoring with its application on mi-
croblog. In the traditional sentiment analysis,
unsupervised methods were adopted in Ku(2005),
Shen(2009), Vasileios(2000) and Turney(2002),
and the limitation of such approaches based on
semantic dictionary mainly is unable to solve the
problem of Out-of-Vocabulary words. Super-
vised methods were employed with model of
machine learning, such as Naive Bayes, Max
Entropy, Support Vector Machine in Pang(2002),
Dasgupta(2009), and Li(2011).
Hashtags, in the form of “# topic# ”, are
widely used as topics in Chinese microblogs. For
the topic-related work, Wang(2011) and
Jakob(2010) made research on hashtag-level sen-
timent classification in twitter. In the traditional
sentiment analysis, the object people express
sentiment on is not taken into consideration. And
these methods are mostly topic-ignored and can-
not perform the accurate sentiment analysis in
many topic-related messages. We summarize
such kind of difficult cases into two categories.
1) Microblogs with multiple candidate topics
For example, “# 三星 galaxy s6## 华为
P8##mate8#”三星 galaxy s6 真没什么亮点,华
为 P8 就可以秒它了,更不用说 mate8[拜拜]”.
This sentence conveys negative sentiment to-
wards topic of “三星 galaxy s6”, but positive
sentiment towards topic of “华为 P8” and “ma-
te8”.
2) Microblogs with topic specific sentimental
words
For example, “#股票#前天刚入手一支股票,
一直在升,股价越来越高” and “#三星#三星手
机电量明显不够用,耗能高”. The word “高”
carrys positive sentiment orientation in the first
sentence towards topic “股票” and negative sen-
timent orientation in the latter towards topic “三
星”.
Considering the importance of topical infor-
mation in microblogs, this paper studied topic-
based Chinese message polarity classification.
Given a message from Chinese Weibo Platform
(Such as Sina, Tencent, NetEase etc. ) and a top-
ic, classify whether the message is of positive,
negative, or neutral sentiment towards the given
topic. For messages conveying both a positive
and negative sentiment towards the topic, which-
ever is the stronger sentiment should be chosen.
158