8 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 32, NO. 1, JANUARY 2021
Fig. 2. Different GNN models built with graph convolutional layers. The
term Gconv denotes a graph convolutional layer. The term MLP denotes a
multilayer perceptron. The term CNN denotes a standard convolutional layer .
(a) ConvGNN with multiple graph conv olutional layers. A graph convolu-
tional layer encapsulates each node’s hidden representation by aggregating
feature information from its neighbors. After feature aggregation, a nonlinear
transformation is applied to the resulted outputs. By stacking multiple layers,
the final hidden representation of each node receives messages from a further
neighborhood. (b) ConvGNN with pooling and readout layers for graph
classification [21]. A graph convolutional layer is followed by a pooling
layer to coarsen a graph into subgraphs so that node representations on
coarsened graphs represent higher graph-level representations. A readout
layer summarizes the final graph representation by taking the sum/mean of
hidden representations of subgraphs. (c) GAE for network embedding [61].
The encoder uses graph convolutiona l layers to get a network embedding
for each node. The decoder computes the pairwise distance given network
embeddings. After applying a nonlinear activ ation function, the decoder
reconstructs the graph adjacency matrix. The network is trained by minimizing
the discrepancy between the real adjacency matrix and the reconstructed
adjacency matrix. (d) STGNN for spatial–temporal graph forecasting [74].
A graph convolutional layer is followed by a 1-D-CNN layer. The graph
convolutional layer operates on A and X
(t)
to capture the spatial dependence,
while the 1-D-CNN layer slides over X along the time axis to capture the
temporal dependence. The output layer is a linear transformation, generating
a prediction for each node, such as its future value at the next time step.
operation in each model. As methods in [19] and [20] require
eigenvalue decomposition, the time complexity is O(n
3
).
The time complexity of [46] is also O(n
3
) due to the
node pairwise shortest-path computation. Other methods incur
equivalent time complexity, which is O(m) if the graph
adjacency matrix is sparse and is O(n
2
) other w ise. This is
because, in these methods, the computation of each node
v
i
’s representation involves its d
i
neighbors, and the sum of
d
i
over all nodes exactly equals the number of edges. The
time complexity of several methods is missing in Table III.
These methods either lack a time complexity analysis in their
articles or report the time complexity of their overall m odels
or algorithms.
IV. R
ECURREN T GRAPH NEURAL NETWORKS
RecGNNs are mostly p io neer works of GNNs. Th ey app ly
the same set of parameters recurrently over nodes in a graph
to extract high-level node representations. Constrained by
computational power, earlier research is mainly focused on
directed acyclic graphs [13], [80].
GNN*
2
proposed by Scarselli et al. extends prior recurrent
models to handle g eneral types of graphs, e.g., acyclic, cyclic,
directed, and undirected graphs [15]. Based on an informa-
tion diffusion mechanism, GNN* updates nodes’ states by
exchanging neighborhood information recurrently until a sta-
ble equilibrium is reached. A node’s hidden state is recurrently
updated b y
h
(t)
v
=
u∈N (v)
f (x
v
, x
e
(v,u)
, x
u
, h
(t−1)
u
) (1)
where f (·) is a parametric function and h
(0)
v
is initialized
randomly. The sum operation enables GNN* to be applicable
to all nodes, even if the number of neighbors differs and
no neighborhood ordering is known. To ensure convergence,
the recurrent function f (·) must be a contraction mapping,
which shrinks the distance between two points after projecting
them into a latent space. In the case o f f (·) being a neural net-
work, a penalty term has to b e imposed on the Jacobian matrix
of parameters. When a convergence criterion is satisfied, the
last step node hidden states are forwarded to a readout layer.
GNN* alternates the stage of node state propagation and the
stage of parameter gradient computation to minimize a train-
ing objective. This strategy enables GNN* to handle cyclic
graphs. In the follow-up works, the graph echo state network
(GraphESN) [16] extends echo state networks to improve the
training efficiency of GNN*. GraphESN consists of an encoder
and an output layer. The encoder is randomly initialized and
requires no training. It implements a contractive state transition
function to recurrently update node states until the global
graph state reaches convergence. Afterward, the output layer
is trained by taking the fixed node states as inputs.
Gated GNN (GGNN) [17] employs a gated recurren t un it
(GRU) [81] as a recurrent function, reducing the recurrence
to a fixed number of steps. The advantage is that it no longer
needs to constrain parameters to ensure convergence. A node
hidden state is updated by its previous hidden states and its
neighboring hidden states, defined as
h
(t)
v
= GRU
⎛
⎝
h
(t−1)
v
,
u∈N (v)
Wh
(t−1)
u
⎞
⎠
(2)
where h
(0)
v
= x
v
. Different from GNN* and GraphESN,
GGNN uses the backpropagation through time (BPTT) algo-
rithm to learn the model parameters. This can be problematic
2
As GNN is used to represent broad graph neural networks in this article,
we name this particular method GNN* to avoid ambiguity.
Authorized licensed use limited to: Fujian Normal University. Downloaded on March 27,2021 at 01:56:03 UTC from IEEE Xplore. Restrictions apply.