An Ensemble Method for Unbalanced Sentiment
Classification
Dongmei Zhang
School of Computer Science & Technology
Shandong Jianzhu University
Jinan, China
Jun Ma
School of Computer Science & Technology
Shandong University
Jinan, China
Jing Yi, Xiaofei Niu, Xiaojing Xu
School of Computer Science & Technology
Shandong Jianzhu University
Jinan, China
Abstract—Current binary sentiment classification has been
focusing on improving the performance of classification, while
the imbalance of sentiment data set in practical applications,
which means the number of samples in one category is several
folds of that of another category, is neglected. Most study on
sentiment classification has been done on the balanced data, so
these methods perform well on balanced data, while are unable to
maintain the same performance on unbalanced data set. This
paper proposed a method for unbalanced sentiment classification
that combines unbalanced classification method and ensemble
learning technique. Both algorithm and data set are considered to
enhance the classification performance of imbalance sentiment
data set. Under the framework of ensemble learning, this hybrid
method integrates three different methods: under-sampling,
bootstrap re-sampling and random feature selection to process
the data set. Experiments on the unbalanced data set prove that
this ensemble method can improve the classification performance
of unbalanced sentiment data set.
Keywords-Sentiment classification; Unbalanced data
classification; Ensemble learning
I. INTRODUCTION AND PROBLEM STATEMENT
Nowadays World Wide Web has become the largest
information source of the world. Furthermore, with the
emergence of Web 2.0, there are numerous online review sites,
web forums, personal blogs and social networking sites, which
make the Web a large data source of evaluative texts in various
forms, such as consumer reviews of a product, comments of a
viewpoint and so on [1, 2]. In the past the web users are
consumers of web content. Now they are also contributors of
web content through posting their opinions and comments on
the web. Meanwhile these evaluative texts on the web can
bring benefit to people [2, 3]. For example, comments from
customers can help people make a reasonable purchase
decision.
But it becomes more and more difficult for web users to
find valuable information in such a huge repository when the
quantity of evaluative texts expands, thus sentiment
classification becomes more and more important [4, 5, 6].
Sentiment classification has been applied to many areas. It is
used to annotate the sentiment content in text, categorize
opinions in product reviews, etc. Some of other terms used in
previous papers are sentiment analysis, opinion extraction and
affect analysis [7, 8, 9]. Sentiment classification has become an
overlapping research issue in multiple research areas, such as
Data Mining (DM), Machine Learning (ML), and so on [10,
11].
Utilizing sentiment classification technology, a summary
result of numerous evaluative texts can be provided, for
example, classifying product comments into negative and
positive categories [12, 13, 14]. Both consumers and
manufacturers can benefit from classifying evaluative text.
Thus the interest in sentiment classification is increasing,
especially to commercial websites that have tremendous
product reviews.
Nevertheless, current research on sentiment classification
has been focusing on improving the performance of
classification and the imbalance of sentiment data set has not
been plenty studied [15]. Unbalanced sentiment classification
means sentiment classification of unbalanced data set, which is
a data set that the size of a category is several folds of that of
another category. Previous study on sentiment classification
has been done on the balanced data, so these methods perform
well on balanced data, while are unable to maintain the same
performance in practical applications at most time. Therefore, it
is essential to study and develop new methods to deal with the
imbalance of sentiment data set and to enhance the
categorization performance in practical applications. Research
on unbalanced sentiment classification has been done through
semi-supervised learning, active learning, etc. [8, 16, 17].
In order to handle this unbalanced sentiment classification
problem, this paper presents the study on unbalanced sentiment
classification basing on ensemble learning. We propose a
method that combines the advantages of under-sampling,
bootstrap re-sampling and random feature selection to obtain
the data set with diversity in both sample space and feature
This work is partly supported by Nationa
atural Science Foundation of China (61170052), Natural Science Foundation
of Shandong Province (ZR2011FQ007) and Research Found of Shandong
Jianzhu University (XNBS1264)