Character-level Convolutional Networks for Text
Classification
∗
Xiang Zhang Junbo Zhao Yann LeCun
Courant Institute of Mathematical Sciences, New York University
719 Broadway, 12th Floor, New York, NY 10003
{xiang, junbo.zhao, yann}@cs.nyu.edu
Abstract
This article offers an empirical exploration on the use of character-level convolu-
tional networks (ConvNets) for text classification. We constructed several large-
scale datasets to show that character-level convolutional networks could achieve
state-of-the-art or competitive results. Comparisons are offered against traditional
models such as bag of words, n-grams and their TFIDF variants, and deep learning
models such as word-based ConvNets and recurrent neural networks.
1 Introduction
Text classification is a classic topic for natural language processing, in which one needs to assign
predefined categories to free-text documents. The range of text classification research goes from
designing the best features to choosing the best possible machine learning classifiers. To date,
almost all techniques of text classification are based on words, in which simple statistics of some
ordered word combinations (such as n-grams) usually perform the best [12].
On the other hand, many researchers have found convolutional networks (ConvNets) [17] [18] are
useful in extracting information from raw signals, ranging from computer vision applications to
speech recognition and others. In particular, time-delay networks used in the early days of deep
learning research are essentially convolutional networks that model sequential data [1] [31].
In this article we explore treating text as a kind of raw signal at character level, and applying tem-
poral (one-dimensional) ConvNets to it. For this article we only used a classification task as a way
to exemplify ConvNets’ ability to understand texts. Historically we know that ConvNets usually
require large-scale datasets to work, therefore we also build several of them. An extensive set of
comparisons is offered with traditional models and other deep learning models.
Applying convolutional networks to text classification or natural language processing at large was
explored in literature. It has been shown that ConvNets can be directly applied to distributed [6] [16]
or discrete [13] embedding of words, without any knowledge on the syntactic or semantic structures
of a language. These approaches have been proven to be competitive to traditional models.
There are also related works that use character-level features for language processing. These in-
clude using character-level n-grams with linear classifiers [15], and incorporating character-level
features to ConvNets [28] [29]. In particular, these ConvNet approaches use words as a basis, in
which character-level features extracted at word [28] or word n-gram [29] level form a distributed
representation. Improvements for part-of-speech tagging and information retrieval were observed.
This article is the first to apply ConvNets only on characters. We show that when trained on large-
scale datasets, deep ConvNets do not require the knowledge of words, in addition to the conclusion
∗
An early version of this work entitled “Text Understanding from Scratch” was posted in Feb 2015 as
arXiv:1502.01710. The present paper has considerably more experimental results and a rewritten introduction.
1