4
loss.
• The weights W are updated according to the gradi-
ent computed in the last step.
Limitations Though experimental results showed that
GNN is a powerful architecture for modeling structural
data, there are still several limitations of the original GNN.
Firstly, it is inefficient to update the hidden states of nodes
iteratively for the fixed point. If relaxing the assumption
of the fixed point, we can design a multi-layer GNN to
get a stable representation of node and its neighborhood.
Secondly, GNN uses the same parameters in the iteration
while most popular neural networks use different parame-
ters in different layers, which serve as a hierarchical feature
extraction method. Moreover, the update of node hidden
states is a sequential process which can benefit from the
RNN kernel like GRU and LSTM. Thirdly, there are also
some informative features on the edges which cannot be
effectively modeled in the original GNN. For example, the
edges in the knowledge graph have the type of relations and
the message propagation through different edges should
be different according to their types. Besides, how to learn
the hidden states of edges is also an important problem.
Lastly, it is unsuitable to use the fixed points if we focus on
the representation of nodes instead of graphs because the
distribution of representation in the fixed point will be much
smooth in value and less informative for distinguishing each
node.
2.2 Variants of Graph Neural Networks
In this subsection, we present several variants of graph
neural networks. Sec 2.2.1 focuses on variants operating
on different graph types. These variants extend the rep-
resentation capability of the original model. Sec 2.2.2 lists
several modifications (convolution, gate mechanism, atten-
tion mechanism and skip connection) on the propagation
step and these models could learn representations with
higher quality. Sec 2.2.3 describes variants using advanced
training methods, which improve the training efficiency. An
overview of different variants of graph neural networks
could be found in Fig. 2.
2.2.1 Graph Types
In the original GNN [18], the input graph consists of nodes
with label information and undirected edges, which is the
simplest graph format. However, there are many variants
of graph in the world. In this subsection, we will introduce
some methods designed to model different kinds of graphs.
Directed Graphs The first variant of graph is directed
graph. Undirected edge which can be treated as two directed
edges shows that there is a relation between two nodes.
However, directed edges can bring more information than
undirected edges. For example, in a knowledge graph where
the edge starts from the head entity and ends at the tail
entity, the head entity is the parent class of the tail entity,
which suggests we should treat the information propagation
process from parent classes and child classes differently.
ADGPM [29] uses two kinds of weight matrix, W
p
and
W
c
, to incorporate more precise structural information. The
propagation rule is shown as follows:
H
t
= σ(D
−1
p
A
p
σ(D
−1
c
A
c
H
t−1
W
c
)W
p
) (7)
where D
−1
p
A
p
, D
−1
c
A
c
are the normalized adjacency matrix
for parents and children respectively.
Heterogeneous Graphs The second variant of graph
is heterogeneous graph, where there are several kinds of
nodes. The simplest way to process heterogeneous graph
is to convert the type of each node to a one-hot feature
vector which is concatenated with the original feature.
What’s more, GraphInception [30] introduces the concept of
metapath into the propagation on the heterogeneous graph.
With metapath, we can group the neighbors according to
their node types and distances. For each neighbor group,
GraphInception treats it as a sub-graph in a homogeneous
graph to do propagation and concatenates the propagation
results from different homogeneous graphs to do a collective
node representation.
Graphs with Edge Information In the final variant of
graph, each edge also has its information like the weight
or the type of the edge. There are two ways to handle
this kind of graphs: Firstly, we can convert the graph to a
bipartite graph where the original edges also become nodes
and one original edge is split into two new edges which
means there are two new edges between the edge node
and begin/end nodes. The encoder of G2S [31] uses the
following aggregation function for neighbors:
h
t
v
= ρ(
1
|N
v
|
X
u∈N
v
W
r
(r
t
v
h
t−1
u
) + b
r
) (8)
where W
r
and b
r
are the propagation parameters for
different types of edges (relations). Secondly, we can adapt
different weight matrices for the propagation on different
kinds of edges. When the number of relations is very large,
r-GCN [32] introduces two kinds of regularization to reduce
the number of parameters for modeling amounts of rela-
tions: basis- and block-diagonal-decomposition. With the basis
decomposition, each W
r
is defined as follows:
W
r
=
B
X
1
a
rb
V
b
(9)
i.e. as a linear combination of basis transformations V
b
∈
R
d
in
×d
out
with coefficients a
rb
such that only the coefficients
depend on r. In the block-diagonal decomposition, r-GCN
defines each W
r
through the direct sum over a set of low-
dimensional matrices, which needs more parameters than
the first one.
2.2.2 Propagation Types
The propagation step and output step are of vital impor-
tance in the model to obtain the hidden states of nodes
(or edges). As we list below, there are several major mod-
ifications in the propagation step from the original graph
neural network model while researchers usually follow a
simple feed-forward neural network setting in the output
step. The comparison of different variants of GNN could be
found in Table 2. The variants utilize different aggregators to