A Joint Model for Chinese Microblog Sentiment Analysis
Yuhui Cao, Zhao Chen, Ruifeng Xu
∗
, Tao Chen and Lin Gui
Shenzhen Engineering Laboratory of Performance Robots at Digital Stage,
Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
caoyuhuiszu@gmail.com xuruifeng@hitsz.edu.cn
Abstract
Topic-based sentiment analysis for Chi-
nese microblog aims to identify the
user attitude on specified topics. In
this paper, we propose a joint model
by incorporating Support Vector Ma-
chines (SVM) and deep neural network
to improve the performance of senti-
ment analysis. Firstly, a SVM Clas-
sifier is constructed using N-gram, N-
POS and sentiment lexicons features.
Meanwhile, a convolutional neural net-
work is applied to learn paragraph rep-
resentation features as the input of an-
other SVM classifier. The classification
results outputted by these two classi-
fiers are merged as the final classifica-
tion results. The evaluations on the
SIGHAN-8 Topic-based Chinese mi-
croblog sentiment analysis task show
that our proposed approach achieves
the second rank on micro average F1
and the fourth rank on macro average
F1 among a total of 13 submitted sys-
tems.
1 Introduction
With the development of the Internet, mi-
croblog has become a popular user-generated
content platform where users share the newest
events or their personal feelings with each
other. Topic-based microblogs are the most
common interactive way for users to share
their opinions towards a specified topic. To
identify the opinions of users, sentiment anal-
ysis techniques are investigated to classify
texts into different categorizations according
to their sentiment polarities.
Most existing sentiment classification tech-
niques are based on machine learning al-
gorithms, such as SupportVectorMachine,
Naïve Bayes and MaximumEntropy. The
machine learning based approach uses feature
vectors as the input of classification to pre-
dict the classification results. Thus, feature
engineering, a method for extracting effective
features from texts, plays an important role.
Some commonly used features in sentiment
classification are unigram, bigram and senti-
ment words. However, these features cannot
work well for cross-domain sentiment classifi-
cation because of the lack of domain knowl-
edge.
Danushka Bollegala et al. (2011) used mul-
tiple sources to construct a sentiment sensi-
tive thesaurus to overcome the lack of domain
knowledge. New sentiment words expansion is
another kind of approach to improve the per-
formance of sentiment analysis. Strfano Bac-
cianella et al. (2010) constructed SentiWord-
Net by extending WordNet with sentiment in-
formation. It is now widely used in sentiment
classification for English. As for Chinese senti-
ment analysis, Minlie Huang et al. (2014) pro-
posed a new word detection method by mining
the frequent sentiment word patterns. This
method may discover new sentiment words
from a large scale of unlabeled texts.
With the rapid development of pre-trained
word embedding and deep neural networks,
a new way to represent texts and features
is devloped. Mikolov et al. (2013) showed
that word embedding represents words with
meaningful syntactic and semantic informa-
tion effectively. Recursive neural network pro-
posed by Socher et al. (2011a; 2011b; 2013) is
shown efficient to construct sentence represen-
tations based on the word embedding. Con-
volutional neural networks (CNN), another
deep learn model which achieved success in
image recognition field, was applied to na-
ture language processing with word embed-