Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing
and the 9th International Joint Conference on Natural Language Processing, pages 3442–3448,
Hong Kong, China, November 3–7, 2019.
c
2019 Association for Computational Linguistics
3442
Text Level Graph Neural Network for Text Classification
Lianzhe Huang, Dehong Ma, Sujian Li, Xiaodong Zhang and Houfeng WANG
MOE Key Lab of Computational Linguistics, Peking University, Beijing, 100871, China
{hlz, madehong, lisujian, zxdcs, wanghf}@pku.edu.cn
Abstract
Recently, researches have explored the graph
neural network (GNN) techniques on text clas-
sification, since GNN does well in handling
complex structures and preserving global in-
formation. However, previous methods based
on GNN are mainly faced with the practical
problems of fixed corpus level graph structure
which do not support online testing and high
memory consumption. To tackle the problems,
we propose a new GNN based model that
builds graphs for each input text with global
parameters sharing instead of a single graph
for the whole corpus. This method removes
the burden of dependence between an individ-
ual text and entire corpus which support online
testing, but still preserve global information.
Besides, we build graphs by much smaller
windows in the text, which not only extract
more local features but also significantly re-
duce the edge numbers as well as memory con-
sumption. Experiments show that our model
outperforms existing models on several text
classification datasets even with consuming
less memory.
1 Introduction
Text classification is a fundamental problem of
natural language processing (NLP), which has lots
of applications like SPAM detection, news filter-
ing, and so on (Jindal and Liu, 2007; Aggarwal
and Zhai, 2012). The essential step for text classi-
fication is text representation learning.
With the development of deep learning, neu-
ral networks like Convolutional Neural Net-
works (CNN) (Kim, 2014) and Recurrent Neu-
ral Networks (RNN) (Hochreiter and Schmidhu-
ber, 1997) have been employed for text repre-
sentation. Recently, a new kind of neural net-
work named Graph Neural Network (GNN) has
attracted wide attention (Battaglia et al., 2018).
GNN was first proposed in (Scarselli et al., 2009)
and has been used in many tasks in NLP includ-
ing text classification (Defferrard et al., 2016), se-
quence labeling (Zhang et al., 2018a), neural ma-
chine translation (Bastings et al., 2017), and rela-
tional reasoning (Battaglia et al., 2016). Deffer-
rard et al. (2016) first employed Graph Convolu-
tional Neural Network (GCN) in text classifica-
tion task and outperformed the traditional CNN
models. Further, Yao et al. (2019) improved Def-
ferrard et al. (2016)’s work by applying article
nodes and weighted edges in the graph, and their
model outperformed the state-of-the-art text clas-
sification methods.
However, these GNN-based models usually
adopt the way of building one graph for the whole
corpus, which causes the following problems in
practice. First, high memory consumption is re-
quired due to numerous edges. Because this kind
of methods build a single graph for the whole cor-
pus and use edges with fixed weights, which con-
siderably limits the expression ability of edges,
they have to use a large connection window to get
a global representation. Second, it is difficult for
this kind of models to conduct the online test, be-
cause the structure and parameters of their graph
are dependent on the corpus and cannot be modi-
fied after training.
To address the above problems, we propose a
new GNN based method for text classification. In-
stead of building a single corpus level graph, we
produce a text level graph for each input text. For
a text level graph, we connect word nodes within a
reasonably small window in the text rather than di-
rectly fully connect all the word nodes. The repre-
sentations of the same nodes and weights of edges
are shared globally and can be updated in the text
level graphs through a massage passing mecha-
nism, where a node takes in the information from
neighboring nodes to update its representation. Fi-
nally, we summarize the representations of all the