Effective Use of Word Order for Text Categorization
with Convolutional Neural Networks
Rie Johnson
RJ Research Consulting
Tarrytown, NY, USA
riejohnson@gmail.com
Tong Zhang
Baidu Inc., Beijing, China
Rutgers University, Piscataway, NJ, USA
tzhang@stat.rutgers.edu
Abstract
Convolutional neural network (CNN) is a neu-
ral network that can make use of the inter-
nal structure of data such as the 2D structure
of image data. This paper studies CNN on
text categorization to exploit the 1D structure
(namely, word order) of text data for accurate
prediction. Instead of using low-dimensional
word vectors as input as is often done, we
directly apply CNN to high-dimensional text
data, which leads to directly learning embed-
ding of small text regions for use in classifi-
cation. In addition to a straightforward adap-
tation of CNN from image to text, a sim-
ple but new variation which employs bag-of-
word conversion in the convolution layer is
proposed. An extension to combine multiple
convolution layers is also explored for higher
accuracy. The experiments demonstrate the
effectiveness of our approach in comparison
with state-of-the-art methods.
1 Introduction
Text categorization is the task of automatically as-
signing pre-defined categories to documents writ-
ten in natural languages. Several types of text cat-
egorization have been studied, each of which deals
with different types of documents and categories,
such as topic categorization to detect discussed top-
ics (e.g., sports, politics), spam detection (Sahami et
al., 1998), and sentiment classification (Pang et al.,
2002; Pang and Lee, 2008; Maas et al., 2011) to de-
termine the sentiment typically in product or movie
reviews. A standard approach to text categorization
is to represent documents by bag-of-word vectors,
To appear in NAACL HLT 2015.
namely, vectors that indicate which words appear in
the documents but do not preserve word order, and
use classification models such as SVM.
It has been noted that loss of word order caused
by bag-of-word vectors (bow vectors) is particularly
problematic on sentiment classification. A simple
remedy is to use word bi-grams in addition to uni-
grams (Blitzer et al., 2007; Glorot et al., 2011; Wang
and Manning, 2012). However, use of word n-grams
with n > 1 on text categorization in general is not
always effective; e.g., on topic categorization, sim-
ply adding phrases or n-grams is not effective (see,
e.g., references in (Tan et al., 2002)).
To benefit from word order on text categoriza-
tion, we take a different approach, which employs
convolutional neural networks (CNN) (LeCun et al.,
1986). CNN is a neural network that can make use
of the internal structure of data such as the 2D struc-
ture of image data through convolution layers, where
each computation unit responds to a small region of
input data (e.g., a small square of a large image).
We apply CNN to text categorization to make use of
the 1D structure (word order) of document data so
that each unit in the convolution layer responds to a
small region of a document (a sequence of words).
CNN has been very successful on image clas-
sification; see e.g., the winning solutions of Im-
ageNet Large Scale Visual Recognition Challenge
(Krizhevsky et al., 2012; Szegedy et al., 2014; Rus-
sakovsky et al., 2014).
On text, since the work on token-level applica-
tions (e.g., POS tagging) by Collobert et al. (2011),
CNN has been used in systems for entity search, sen-
tence modeling, word embedding learning, product
feature mining, and so on (Xu and Sarikaya, 2013;
Gao et al., 2014; Shen et al., 2014; Kalchbrenner et
arXiv:1412.1058v2 [cs.CL] 26 Mar 2015