Baselines and Bigrams: Simple, Good Sentiment and Topic Classification
Sida Wang and Christopher D. Manning
Department of Computer Science
Stanford University
Stanford, CA 94305
{sidaw,manning}@stanford.edu
Abstract
Variants of Naive Bayes (NB) and Support
Vector Machines (SVM) are often used as
baseline methods for text classification, but
their performance varies greatly depending on
the model variant, features used and task/
dataset. We show that: (i) the inclusion of
word bigram features gives consistent gains on
sentiment analysis tasks; (ii) for short snippet
sentiment tasks, NB actually does better than
SVMs (while for longer documents the oppo-
site result holds); (iii) a simple but novel SVM
variant using NB log-count ratios as feature
values consistently performs well across tasks
and datasets. Based on these observations, we
identify simple NB and SVM variants which
outperform most published results on senti-
ment analysis datasets, sometimes providing
a new state-of-the-art performance level.
1 Introduction
Naive Bayes (NB) and Support Vector Machine
(SVM) models are often used as baselines for other
methods in text categorization and sentiment analy-
sis research. However, their performance varies sig-
nificantly depending on which variant, features and
datasets are used. We show that researchers have
not paid sufficient attention to these model selec-
tion issues. Indeed, we show that the better variants
often outperform recently published state-of-the-art
methods on many datasets. We attempt to catego-
rize which method, which variants and which fea-
tures perform better under which circumstances.
First, we make an important distinction between
sentiment classification and topical text classifica-
tion. We show that the usefulness of bigram features
in bag of features sentiment classification has been
underappreciated, perhaps because their usefulness
is more of a mixed bag for topical text classifica-
tion tasks. We then distinguish between short snip-
pet sentiment tasks and longer reviews, showing that
for the former, NB outperforms SVMs. Contrary to
claims in the literature, we show that bag of features
models are still strong performers on snippet senti-
ment classification tasks, with NB models generally
outperforming the sophisticated, structure-sensitive
models explored in recent work. Furthermore, by
combining generative and discriminative classifiers,
we present a simple model variant where an SVM is
built over NB log-count ratios as feature values, and
show that it is a strong and robust performer over all
the presented tasks. Finally, we confirm the well-
known result that MNB is normally better and more
stable than multivariate Bernoulli NB, and the in-
creasingly known result that binarized MNB is bet-
ter than standard MNB. The code and datasets to
reproduce the results in this paper are publicly avail-
able.
1
2 The Methods
We formulate our main model variants as linear clas-
sifiers, where the prediction for test case k is
y
(k)
= sign(w
T
x
(k)
+ b) (1)
Details of the equivalent probabilistic formulations
are presented in (McCallum and Nigam, 1998).
Let f
(i)
∈ R
|V |
be the feature count vector for
training case i with label y
(i)
∈ {−1, 1}. V is the
1
http://www.stanford.edu/
∼
sidaw