没有合适的资源？快使用搜索试试~ 我知道了~

首页A Comprehensive Survey on Graph Neural Network（2019）.pdf

# A Comprehensive Survey on Graph Neural Network（2019）.pdf

需积分: 45 926 浏览量
更新于2023-03-16
评论 1
收藏 1.81MB PDF 举报

A Comprehensive Survey on Graph Neural Network（2019）.pdf

资源详情

资源评论

资源推荐

JOURNAL OF L

A

T

E

X CLASS FILES, VOL. X, NO. X, DECEMBER 2018 1

A Comprehensive Survey on Graph Neural

Networks

Zonghan Wu, Shirui Pan, Member, IEEE, Fengwen Chen, Guodong Long,

Chengqi Zhang, Senior Member, IEEE, Philip S. Yu, Fellow, IEEE

Abstract—Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classiﬁcation and video

processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the

Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and

are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has

imposed signiﬁcant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning

approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in

data mining and machine learning ﬁelds. We propose a new taxonomy to divide the state-of-the-art graph neural networks into different

categories. With a focus on graph convolutional networks, we review alternative architectures that have recently been developed; these

learning paradigms include graph attention networks, graph autoencoders, graph generative networks, and graph spatial-temporal

networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes

and benchmarks of the existing algorithms on different learning tasks. Finally, we propose potential research directions in this

fast-growing ﬁeld.

Index Terms—Deep Learning, graph neural networks, graph convolutional networks, graph representation learning, graph

autoencoder, network embedding

F

1 INTRODUCTION

T

HE recent success of neural networks has boosted re-

search on pattern recognition and data mining. Many

machine learning tasks such as object detection [1], [2], ma-

chine translation [3], [4], and speech recognition [5], which

once heavily relied on handcrafted feature engineering to

extract informative feature sets, has recently been revolu-

tionized by various end-to-end deep learning paradigms,

i.e., convolutional neural networks (CNNs) [6], long short-

term memory (LSTM) [7], and autoencoders. The success

of deep learning in many domains is partially attributed to

the rapidly developing computational resources (e.g., GPU)

and the availability of large training data, and is partially

due to the effectiveness of deep learning to extract latent

representation from Euclidean data (e.g., images, text, and

video). Taking image analysis as an example, an image can

be represented as a regular grid in the Euclidean space.

A convolutional neural network (CNN) is able to exploit

the shift-invariance, local connectivity, and compositionality

of image data [8], and as a result, CNN can extract local

• Z. Wu, F. Chen, G. Long, C. Zhang are with Centre for

Artiﬁcial Intelligence, FEIT, University of Technology Sydney,

NSW 2007, Australia (E-mail: zonghan.wu-3@student.uts.edu.au;

fengwen.chen@student.uts.edu.au; guodong.long@uts.edu.au;

chengqi.zhang@uts.edu.au).

• S. Pan is with Faculty of Information Technology, Monash University,

Clayton, VIC 3800, Australia (Email: shirui.pan@monash.edu).

• P. S. Yu is with Department of Computer Science, University of Illinois

at Chicago, Chicago, IL 60607-7053, USA (Email: psyu@uic.edu)

• Corresponding author: Shirui Pan.

Manuscript received Dec xx, 2018; revised Dec xx, 201x.

meaningful features that are shared with the entire datasets

for various image analysis tasks.

While deep learning has achieved great success on Eu-

clidean data, there is an increasing number of applications

where data are generated from the non-Euclidean domain

and need to be effectively analyzed. For instance, in e-

commerce, a graph-based learning system is able to exploit

the interactions between users and products [9], [10], [11]

to make highly accurate recommendations. In chemistry,

molecules are modeled as graphs and their bioactivity needs

to be identiﬁed for drug discovery [12], [13]. In a citation

network, papers are linked to each other via citations and

they need to be categorized into different groups [14],

[15]. The complexity of graph data has imposed signiﬁcant

challenges on existing machine learning algorithms. This is

because graph data are irregular. Each graph has a variable

size of unordered nodes and each node in a graph has

a different number of neighbors, causing some important

operations (e.g., convolutions), which are easy to compute

in the image domain but are not directly applicable to the

graph domain anymore. Furthermore, a core assumption of

existing machine learning algorithms is that instances are

independent of each other. However, this is not the case for

graph data where each instance (node) is related to others

(neighbors) via some complex linkage information, which is

used to capture the interdependence among data, including

citations, friendship, and interactions.

Recently, there is increasing interest in extending deep

learning approaches for graph data. Driven by the success

of deep learning, researchers have borrowed ideas from

convolution networks, recurrent networks, and deep auto-

encoders to design the architecture of graph neural net-

arXiv:1901.00596v2 [cs.LG] 10 Mar 2019

JOURNAL OF L

A

T

E

X CLASS FILES, VOL. X, NO. X, DECEMBER 2018 2

works. To handle the complexity of graph data, new gen-

eralizations and deﬁnitions for important operations have

been rapidly developed over the past few years. For in-

stance, Figure 1 illustrates how a kind of graph convolution

is inspired by a standard 2D convolution. This survey aims

to provide a comprehensive overview of these methods, for

both interested researchers who want to enter this rapidly

developing ﬁeld and experts who would like to compare

graph neural network algorithms.

A Brief History of Graph Neural Networks The nota-

tion of graph neural networks was ﬁrstly outlined in Gori

et al. (2005) [16], and further elaborated in Micheli (2009)

[17] and Scarselli et al. (2009) [18]. These early studies learn

a target node’s representation by propagating neighbor in-

formation via recurrent neural architectures in an iterative

manner until a stable ﬁxed point is reached. This process

is computationally expensive, and recently there have been

increasing efforts to overcome these challenges [19], [20]. In

our survey, we generalize the term graph neural networks to

represent all deep learning approaches for graph data.

Inspired by the huge success of convolutional networks

in the computer vision domain, a large number of methods

that re-deﬁne the notation of convolution for graph data have

emerged recently. These approaches are under the umbrella

of graph convolutional networks (GCNs). The ﬁrst promi-

nent research on GCNs is presented in Bruna et al. (2013),

which develops a variant of graph convolution based on

spectral graph theory [21]. Since that time, there have been

increasing improvements, extensions, and approximations

on spectral-based graph convolutional networks [12], [14],

[22], [23], [24]. As spectral methods usually handle the

whole graph simultaneously and are difﬁcult to parallel

or scale to large graphs, spatial-based graph convolutional

networks have rapidly developed recently [25], [26], [27],

[28]. These methods directly perform the convolution in the

graph domain by aggregating the neighbor nodes’ informa-

tion. Together with sampling strategies, the computation can

be performed in a batch of nodes instead of the whole graph

[25], [28], which has the potential to improve efﬁciency.

In addition to graph convolutional networks, many alter-

native graph neural networks have been developed in the

past few years. These approaches include graph attention

networks, graph autoencoders, graph generative networks,

and graph spatial-temporal networks. Details on the catego-

rization of these methods are given in Section 3.

Related surveys on graph neural networks. There are

a limited number of existing reviews on the topic of graph

neural networks. Using the notation geometric deep learning,

Bronstein et al. [8] give an overview of deep learning

methods in the non-Euclidean domain, including graphs

and manifolds. While being the ﬁrst review on graph con-

volution networks, this survey misses several important

spatial-based approaches, including [15], [20], [25], [27],

[28], [29], which update state-of-the-art benchmarks. Fur-

thermore, this survey does not cover many newly devel-

oped architectures which are equally important to graph

convolutional networks. These learning paradigms, includ-

ing graph attention networks, graph autoencoders, graph

generative networks, and graph spatial-temporal networks,

are comprehensively reviewed in this article. Battaglia et

(a) 2D Convolution. Analo-

gous to a graph, each pixel

in an image is taken as a

node where neighbors are de-

termined by the ﬁlter size.

The 2D convolution takes a

weighted average of pixel val-

ues of the red node along with

its neighbors. The neighbors of

a node are ordered and have a

ﬁxed size.

(b) Graph Convolution. To get

a hidden representation of the

red node, one simple solution

of graph convolution opera-

tion takes the average value

of node features of the red

node along with its neighbors.

Different from image data, the

neighbors of a node are un-

ordered and variable in size.

Fig. 1: 2D Convolution vs. Graph Convolution.

al. [30] position graph networks as the building blocks for

learning from relational data, reviewing part of graph neu-

ral networks under a uniﬁed framework. However, their

generalized framework is highly abstract, losing insights on

each method from its original paper. Lee et al. [31] conduct

a partial survey on the graph attention model, which is

one type of graph neural network. Most recently, Zhang et

al. [32] present a most up-to-date survey on deep learning

for graphs, missing those studies on graph generative and

spatial-temporal networks. In summary, none of the existing

surveys provide a comprehensive overview of graph neural

networks, only covering some of the graph convolution

neural networks and examining a limited number of works,

thereby missing the most recent development of alternative

graph neural networks, such as graph generative networks

and graph spatial-temporal networks.

Graph neural networks vs. network embedding The

research on graph neural networks is closely related to

graph embedding or network embedding, another topic

which attracts increasing attention from both the data min-

ing and machine learning communities [33] [34] [35] [36],

[37], [38]. Network embedding aims to represent network

vertices into a low-dimensional vector space, by preserving

both network topology structure and node content informa-

tion, so that any subsequent graph analytics tasks such as

classiﬁcation, clustering, and recommendation can be easily

performed by using simple off-the-shelf machine learning

algorithm (e.g., support vector machines for classiﬁcation).

Many network embedding algorithms are typically unsu-

pervised algorithms and they can be broadly classiﬁed into

three groups [33], i.e., matrix factorization [39], [40], ran-

dom walks [41], and deep learning approaches. The deep

learning approaches for network embedding at the same

time belong to graph neural networks, which include graph

autoencoder-based algorithms (e.g., DNGR [42] and SDNE

[43]) and graph convolution neural networks with unsuper-

vised training(e.g., GraphSage [25]). Figure 2 describes the

differences between network embedding and graph neural

JOURNAL OF L

A

T

E

X CLASS FILES, VOL. X, NO. X, DECEMBER 2018 3

Fig. 2: Network Embedding v.s. Graph Neural Networks.

networks in this paper.

Our Contributions Our paper makes notable contribu-

tions summarized as follows:

• New taxonomy In light of the increasing number of

studies on deep learning for graph data, we propose

a new taxonomy of graph neural networks (GNNs).

In this taxonomy, GNNs are categorized into ﬁve

groups: graph convolution networks, graph atten-

tion networks, graph auto-encoders, graph genera-

tive networks, and graph spatial-temporal networks.

We pinpoint the differences between graph neural

networks and network embedding and draw the

connections between different graph neural network

architectures.

• Comprehensive review This survey provides the

most comprehensive overview of modern deep

learning techniques for graph data. For each type

of graph neural network, we provide detailed de-

scriptions on representative algorithms, and make

a necessary comparison and summarise the corre-

sponding algorithms.

• Abundant resources This survey provides abundant

resources on graph neural networks, which include

state-of-the-art algorithms, benchmark datasets,

open-source codes, and practical applications. This

survey can be used as a hands-on guide for under-

standing, using, and developing different deep learn-

ing approaches for various real-life applications.

• Future directions This survey also highlights the cur-

rent limitations of the existing algorithms, and points

out possible directions in this rapidly developing

ﬁeld.

Organization of Our Survey The rest of this survey

is organized as follows. Section 2 deﬁnes a list of graph-

related concepts. Section 3 clariﬁes the categorization of

graph neural networks. Section 4 and Section 5 provides

an overview of graph neural network models. Section 6

presents a gallery of applications across various domains.

Section 7 discusses the current challenges and suggests

future directions. Section 8 summarizes the paper.

TABLE 1: Commonly used notations.

Notations Descriptions

| · | The length of a set

Element-wise product.

A

T

Transpose of vector/matrix A.

[A, B] Concatenation of A and B.

G A graph

V The set of nodes in a graph

v

i

A node v

i

∈ V

N(v) The neighbors of node v

E The set of edges in a graph

e

ij

An edge e

ij

∈ E

X ∈ R

N×D

The feature matrix of a graph.

x ∈ R

N

The feature vector of a graph in the case of D = 1.

X

i

∈ R

D

The feature vector of the node v

i

.

N The number of nodes, N = |V|.

M The number of edges, M = |E|.

D The dimension of a node vector.

T The total number of time steps in time series.

Fig. 3: Categorization of Graph Neural Networks.

2 DEFINITION

In this section, we provide deﬁnitions of basic graph con-

cepts. For easy retrieval, we summarize the commonly used

notations in Table 1.

Deﬁnition 1 (Graph). A Graph is G = (V, E, A) where V

is the set of nodes, E is the set of edges, and A is the

adjacency matrix. In a graph, let v

i

∈ V to denote a node

and e

ij

= (v

i

, v

j

) ∈ E to denote an edge. The adjacency

matrix A is a N × N matrix with A

ij

= w

ij

> 0 if

e

ij

∈ E and A

ij

= 0 if e

ij

/∈ E. The degree of a node is

the number of edges connected to it.

A graph can be associated with node attributes X

1

,

where X ∈ R

N×D

is a feature matrix with X

i

∈ R

D

representing the feature vector of node v

i

. In the case of

D = 1, we replace x ∈ R

N

with X to denote the feature

vector of the graph.

Deﬁnition 2 (Directed Graph). A directed graph is a graph

with all edges pointing from one node to another. For

a directed graph, A

ij

6= A

ji

. An undirected graph is

1. Such graph is referred to an attributed graph in literature.

JOURNAL OF L

A

T

E

X CLASS FILES, VOL. X, NO. X, DECEMBER 2018 4

Fig. 4: A Variant of Graph Convolution Networks with Mul-

tiple GCN layers [14]. A GCN layer encapsulates each node’s

hidden representation by aggregating feature information from

its neighbors. After feature aggregation, a non-linear transfor-

mation is applied to the resultant outputs. By stacking multiple

layers, the ﬁnal hidden representation of each node receives

messages from a further neighborhood.

a graph with all edges undirected. For an undirected

graph, A

ij

= A

ji

.

Deﬁnition 3 (Spatial-Temporal Graph). A spatial-temporal

graph is an attributed graph where the feature matrix X

evolves over time. It is deﬁned as G = (V, E, A, X) with

X ∈ R

T ×N×D

where T is the length of time steps.

3 CATEGORIZATION AND FRAMEWORKS

In this section, we present our taxonomy of graph neural

networks. We consider any differentiable graph models

which incorporate neural architectures as graph neural net-

works. We categorize graph neural networks into graph con-

volution networks, graph attention networks, graph auto-

encoders, graph generative networks and graph spatial-

temporal networks. Of these, graph convolution networks

play a central role in capturing structural dependencies.

As illustrated in Figure 3, methods under other categories

partially utilize graph convolution networks as building

blocks. We summarize the representative methods in each

category in Table 2, and we give a brief introduction of each

category in the following.

3.1 Taxonomy of GNNs

Graph Convolution Networks (GCNs) generalize the oper-

ation of convolution from traditional data (images or grids)

to graph data. The key is to learn a function f to generate

a node v

i

’s representation by aggregating its own features

X

i

and neighbors’ features X

j

, where j ∈ N (v

i

). Figure 4

shows the process of GCNs for node representation learn-

ing. Graph convolutional networks play a central role in

building up many other complex graph neural network

models, including auto-encoder-based models, generative

models, and spatial-temporal networks, etc. Figure 5 illus-

trates several graph neural network models building on

GCNs.

Graph Attention Networks are similar to GCNs and seek an

aggregation function to fuse the neighboring nodes, random

walks, and candidate models in graphs to learn a new

representation. The key difference is that graph attention

networks employ attention mechanisms which assign larger

weights to the more important nodes, walks, or models. The

attention weight is learned together with neural network

(a) Graph Convolution Networks with Pooling Modules for Graph

Classiﬁcation [12]. A GCN layer [14] is followed by a pooling layer to

coarsen a graph into sub-graphs so that node representations on coars-

ened graphs represent higher graph-level representations. To calculate

the probability for each graph label, the output layer is a linear layer

with the SoftMax function.

(b) Graph Auto-encoder with GCN [62]. The encoder uses GCN layers

to get latent rerpesentations for each node. The decoder computes the

pair-wise distance between node latent representations produced by the

encoder. After applying a non-linear activation function, the decoder

reconstructs the graph adjacency matrix.

(c) Graph Spatial-Temporal Networks with GCN [74]. A GCN layer is

followed by a 1D-CNN layer. The GCN layer operates on A

t

and X

t

to capture spatial dependency, while the 1D-CNN layer slides over X

along the time axis to capture the temporal dependency. The output

layer is a linear transformation, generating a prediction for each node.

Fig. 5: Different Graph Neural Network Models built with

GCNs.

parameters within an end-to-end framework. Figure 6 illus-

trates the difference between graph convolutional networks

and graph attention networks in aggregating the neighbor

node information.

Graph Auto-encoders are unsupervised learning frame-

works which aim to learn low dimensional node vectors

via an encoder, and then reconstruct the graph data via

a decoder. Graph autoencoders are a popular approach to

learn the graph embedding, for both plain graphs with-

out attributed information [42], [43] as well as attributed

graphs [64], [65]. For plain graphs, many algorithms directly

prepossess the adjacency matrix, by either constructing a

new matrix (i.e., pointwise mutual information matrix) with

rich information [42] or feeding the adjacency matrix to

an autoencoder model and capturing both ﬁrst order and

second order information [43]. For attributed graphs, graph

autoencoder models tend to employ GCN [14] as a building

block for the encoder and reconstruct the structure informa-

tion via a link prediction decoder [62], [64].

JOURNAL OF L

A

T

E

X CLASS FILES, VOL. X, NO. X, DECEMBER 2018 5

TABLE 2: Representative Publications of Graph Neural Networks

Category Publications

Graph Convolution Networks

Spectral-based [12], [14], [21], [22], [23], [24], [44], [45], [46]

Spatial-based

[13], [16], [17], [18], [19], [20], [25], [26], [27], [28], [47]

[48], [49], [50], [51], [52], [53], [54], [55], [56], [57]

Pooling modules [12], [22], [58], [59]

Graph Attention Networks [15], [29], [60], [61]

Graph Auto-encoder [42], [43], [62], [63], [64], [65], [66]

Graph Generative Networks [67], [68], [69], [70], [71]

Graph Spatial-Temporal Networks [72], [73], [74], [75], [76]

(a) Graph Convolution Net-

works [14] explicitly assign a

non-parametric weight a

ij

=

1

√

deg(v

i

)deg(v

j

)

to the neigh-

bor v

j

of v

i

during the aggre-

gation process.

(b) Graph Attention Networks

[15] implicitly capture the

weight a

ij

via an end-to-end

neural network architecture,

so that more important nodes

receive larger weights.

Fig. 6: Differences between graph convolutional networks

and graph attention networks.

Graph Generative Networks aim to generate plausible

structures from data. Generating graphs given a graph

empirical distribution is fundamentally challenging, mainly

because graphs are complex data structures. To address this

problem, researchers have explored to factor the generation

process as forming nodes and edges alternatively [67], [68],

to employ generative adversarial training [69], [70]. One

promising application domain of graph generative networks

is chemical compound synthesis. In a chemical graph, atoms

are treated as nodes and chemical bonds are treated as

edges. The task is to discover new synthesizable molecules

which possess certain chemical and physical properties.

Graph Spatial-temporal Networks aim to learn unseen pat-

terns from spatial-temporal graphs, which are increasingly

important in many applications such as trafﬁc forecasting

and human activity prediction. For instance, the underlying

road trafﬁc network is a natural graph where each key loca-

tion is a node whose trafﬁc data is continuously monitored.

By developing effective graph spatial-temporal network

models, we can accurately predict the trafﬁc status over

the whole trafﬁc system [73], [74]. The key idea of graph

spatial-temporal networks is to consider spatial dependency

and temporal dependency at the same time. Many current

approaches apply GCNs to capture the dependency together

with some RNN [73] or CNN [74] to model the temporal

dependency.

3.2 Frameworks

Graph neural networks, graph convolution networks

(GCNs) in particular, try to replicate the success of CNN

in graph data by deﬁning graph convolutions via graph

spectral theory or spatial locality. With the graph structure

and node content information as inputs, the outputs of GCN

can focus on different graph analytics task with one of the

following mechanisms:

• Node-level outputs relate to node regression and

classiﬁcation tasks. As a graph convolution module

directly gives nodes’ latent representations, a multi-

perceptron layer or softmax layer is used as the ﬁnal

layer of GCN. We review graph convolution modules

in Section 4.1 and Section 4.2.

• Edge-level outputs relate to the edge classiﬁca-

tion and link prediction tasks. To predict the la-

bel/connection strength of an edge, an additional

function will take two nodes’ latent representations

from the graph convolution module as inputs.

• Graph-level outputs relate to the graph classiﬁcation

task. To obtain a compact representation on graph

level, a pooling module is used to coarse a graph into

sub-graphs or to sum/average over the node repre-

sentations. We review the graph pooling module in

Section 4.3.

In Table 3, we list the details of the inputs and outputs

of the main GCNs methods. In particular, we summarize

output mechanisms in between each GCN layer and in the

ﬁnal layer of each method. The output mechanisms may

involve several pooling operations, which are discussed in

Section 4.3.

End-to-end Training Frameworks. Graph convolutional net-

works can be trained in a (semi-) supervised or purely un-

supervised way within an end-to-end learning framework,

depending on the learning tasks and label information avail-

able at hand.

• Semi-supervised learning for node-level classiﬁ-

cation. Given a single network with partial nodes

being labeled and others remaining unlabeled, graph

convolutional networks can learn a robust model that

effectively identify the class labels for the unlabeled

nodes [14]. To this end, an end-to-end framework can

be built by stacking a couple of graph convolutional

layers followed by a softmax layer for multi-class

classiﬁcation.

• Supervised learning for graph-level classiﬁcation.

Given a graph dataset, graph-level classiﬁcation aims

to predict the class label(s) for an entire graph [58],

剩余21页未读，继续阅读

安全验证

文档复制为VIP权益，开通VIP直接复制

信息提交成功

## 评论0