Deeper Insights into Graph Convolutional Networks
for Semi-Supervised Learning
Qimai Li
1
, Zhichao Han
12
, Xiao-Ming Wu
1∗
1
The Hong Kong Polytechnic University
2
ETH Zurich
csqmli@comp.polyu.edu.hk, zhhan@student.ethz.ch, xiao-ming.wu@polyu.edu.hk
Abstract
Many interesting problems in machine learning are being
revisited with new deep learning tools. For graph-based semi-
supervised learning, a recent important development is graph
convolutional networks (GCNs), which nicely integrate local
vertex features and graph topology in the convolutional lay-
ers. Although the GCN model compares favorably with other
state-of-the-art methods, its mechanisms are not clear and it
still requires considerable amount of labeled data for valida-
tion and model selection.
In this paper, we develop deeper insights into the GCN model
and address its fundamental limits. First, we show that the
graph convolution of the GCN model is actually a special
form of Laplacian smoothing, which is the key reason why
GCNs work, but it also brings potential concerns of over-
smoothing with many convolutional layers. Second, to over-
come the limits of the GCN model with shallow architectures,
we propose both co-training and self-training approaches to
train GCNs. Our approaches significantly improve GCNs in
learning with very few labels, and exempt them from requir-
ing additional labels for validation. Extensive experiments on
benchmarks have verified our theory and proposals.
1 Introduction
The breakthroughs in deep learning have led to a paradigm
shift in artificial intelligence and machine learning. On the
one hand, numerous old problems have been revisited with
deep neural networks and huge progress has been made in
many tasks previously seemed out of reach, such as machine
translation and computer vision. On the other hand, new
techniques such as geometric deep learning (Bronstein et al.
2017) are being developed to generalize deep neural models
to new or non-traditional domains.
It is well known that training a deep neural model typi-
cally requires a large amount of labeled data, which cannot
be satisfied in many scenarios due to the high cost of labeling
training data. To reduce the amount of data needed for train-
ing, a recent surge of research interest has focused on few-
shot learning (Lake, Salakhutdinov, and Tenenbaum 2015;
Rezende et al. 2016) – to learn a classification model with
very few examples from each class. Closely related to few-
shot learning is semi-supervised learning, where a large
∗
Corresponding author.
Copyright
c
2018, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
amount of unlabeled data can be utilized to train with typi-
cally a small amount of labeled data.
Many researches have shown that leveraging unlabeled
data in training can improve learning accuracy significantly
if used properly (Zhu and Goldberg 2009). The key issue is
to maximize the effective utilization of structural and fea-
ture information of unlabeled data. Due to the powerful fea-
ture extraction capability and recent success of deep neu-
ral networks, there have been some successful attempts to
revisit semi-supervised learning with neural-network-based
models, including ladder network (Rasmus et al. 2015),
semi-supervised embedding (Weston et al. 2008), planetoid
(Yang, Cohen, and Salakhutdinov 2016), and graph convo-
lutional networks (Kipf and Welling 2017).
The recently developed graph convolutional neural net-
works (GCNNs) (Defferrard, Bresson, and Vandergheynst
2016) is a successful attempt of generalizing the power-
ful convolutional neural networks (CNNs) in dealing with
Euclidean data to modeling graph-structured data. In their
pilot work (Kipf and Welling 2017), Kipf and Welling pro-
posed a simplified type of GCNNs, called graph convolu-
tional networks (GCNs), and applied it to semi-supervised
classification. The GCN model naturally integrates the con-
nectivity patterns and feature attributes of graph-structured
data, and outperforms many state-of-the-art methods signif-
icantly on some benchmarks. Nevertheless, it suffers from
similar problems faced by other neural-network-based mod-
els. The working mechanisms of the GCN model for semi-
supervised learning are not clear, and the training of GCNs
still requires considerable amount of labeled data for param-
eter tuning and model selection, which defeats the purpose
for semi-supervised learning.
In this paper, we demystify the GCN model for semi-
supervised learning. In particular, we show that the graph
convolution of the GCN model is simply a special form of
Laplacian smoothing, which mixes the features of a vertex
and its nearby neighbors. The smoothing operation makes
the features of vertices in the same cluster similar, thus
greatly easing the classification task, which is the key rea-
son why GCNs work so well. However, it also brings poten-
tial concerns of over-smoothing. If a GCN is deep with
many convolutional layers, the output features may be over-
smoothed and vertices from different clusters may become
indistinguishable. The mixing happens quickly on small
arXiv:1801.07606v1 [cs.LG] 22 Jan 2018