represents the weight of a directed edge from node u to node
v (i.e., (v, u) ∈ E), and A
v,u
= 0 represents there is no edge
(i.e., (v, u) /∈ E). X ∈ R
|V|×f
n
is a matrix consisting of all
nodes’ f
n
-dimensional feature vectors, and E ∈ R
|V|×|V|×f
e
is a sparse tensor consisting of all edges’ f
e
-dimensional fea-
ture vectors. Specifically, x
v
denotes the feature vector of
v, e
v,u
denotes the feature vector of edge (v, u) if (v, u) ∈ E,
otherwise e
v,u
= 0. In our setting, an undirected graph
is treated as a special directed graph, in which each undi-
rected edge (v, u) is decomposed as two directed edges with
the same edge feature, i.e., (v, u) and (u, v). Moreover, we
use N
+
v
to denote the set of nodes directly pointing at v,
i.e., N
+
v
= {u : A
v,u
> 0}, N
−
v
to denote the set of nodes
directly pointed by v, i.e., N
−
v
= {u : A
u,v
> 0}, and
N
v
= N
+
v
∪ N
−
v
. In other words, N
+
v
denotes the set of in-
edge neighbors of v, while N
−
v
denotes the set of out-edge
neighbors of v. We call the edges pointing at a certain node
as its in-edges, while the edges pointed by this node as its
out-edges.
2.2 Graph Neural Networks
Most GML models aim to encode a graph structure (e.g.,
node, edge, subgraph or the entire graph) as a low dimen-
sional embedding, which is used as the input of the down-
stream machine learning tasks, in an end-to-end or decou-
pled manner. The proposed AGL mainly focuses on GNNs,
which is a category of GML models widely-used. Each layer
of GNNs generates the intermediate embedding by aggregat-
ing the information of target node’s in-edge neighbors. After
stacking several GNN layers, we obtain the final embedding,
which integrate the entire receptive field of the targeted
node. Specifically, we denote the computation paradigm of
the k
th
GNN layer as follows:
h
(k+1)
v
= φ
(k)
({h
(k)
i
}
i∈{v}∪N
+
v
, {e
v,u
}
A
v,u
>0
; W
(k)
φ
), (1)
where h
(k)
v
denotes node v’s intermediate embedding in the
k
th
layer and h
(0)
v
= x
v
. The function φ
(k)
parameterized by
W
(k)
φ
, takes the embeddings of v and its in-edge neighbors
N
+
v
, as well as the edge features associated with v’s in-edges
as inputs, and outputs the embedding for the next GNN
layer.
The above computations of GNNs can be formulated in
the message passing paradigm. That is, we collect keys (i.e.,
node ids) and their values (i.e., embeddings). We first merge
all the values from each node’s in-edge neighbors so as to
have the new values for the nodes. After that, we propagate
the new values to destination nodes via out-edges. After
K times of such merging and propagation, we complete the
computation of GNNs. We will discuss in the following sec-
tions that such a paradigm will be generalized to the training
and inference of GNNs.
2.3 K-hop Neighborhood
Definition 1. k-hop neighborhood. The k-hop neighbor-
hood w.r.t. a targeted node v, denoted as G
k
v
, is defined
as the induced attributed subgraph of G whose node set is
V
k
v
= {v} ∪ {u : d(v, u) ≤ k}, where d(v, u) denotes the
length of the shortest path from u to v. Its edge set con-
sists of the edges in E that have both endpoints in its node
set, i.e. E
k
v
= {(u, u
0
) : (u, u
0
) ∈ E ∧ u ∈ V
k
v
∧ u
0
∈ V
k
v
}.
Moreover, it contains the feature vectors of the nodes and
edges in the k-hop neighborhood, X
k
v
and E
k
v
. Without loss
of generality, we define the 0-hop neighborhood w.r.t. v as
the node v itself.
The following theorem shows the connection between the
computation of GNNs and the k-hop neighborhood.
Theorem 1. Let G
k
v
be the k-hop neighborhood w.r.t. the
target node v, then G
k
v
contains the sufficient and neces-
sary information for a k-layer GNN model, which follows
the paradigm of Equation 1, to generate the embedding of
node v.
First, the 0
th
layer embedding is directly assigned by the
raw feature, i.e., h
(0)
v
= x
v
, which is also the 0-hop neighbor-
hood. And then, from Equation 1, it’s easy to find that the
output embedding of v in each subsequent layer is generated
only based on the embedding of the 1-hop in-edge neighbors
w.r.t. v from the previous layer. Therefore, by applying
mathematical induction, it’s easy to prove the Theorem 1.
Moreover, we can extend the theorem to a batch of nodes,
that is the intersection of the k-hop neighborhoods w.r.t. a
batch of nodes provides the sufficient and necessary infor-
mation for a k-layer GNN model to generate the embedding
of all nodes in the batch. This simple theorem implies that
in a k-layer GNN model the target node’s embedding at the
k
th
layer only depends on its k-hop neighborhood, rather
than the entire graph.
3. SYSTEM
In this section, we first give an overview of our AGL sys-
tem. Then, we elaborate three core modules, i.e., Graph-
Flat, GraphTrainer and GraphInfer. At last, we give a demo
example on how to implement a simple GCN model with the
proposed AGL system.
3.1 System Overview
Our major motivation of building AGL is that the indus-
trial communities desiderate an integrated system of fully-
functional training/inference over graph data, with scalabil-
ity, and in the meanwhile has the properties of fault tolerance
based on mature industrial infrastructures like MapReduce,
parameter servers, etc. That is, instead of requiring a sin-
gle monster machine or customized graph stores with huge
memory and high bandwidth networks, which could be ex-
pensive for Internet companies to upgrade their infrastruc-
tures, we sought to give a solution based on mature and
classic infrastructures, which is ease-to-deploy while enjoy-
ing various properties like fault tolerance and so on. Second,
we need the solution based on mature infrastructures scale
to industrial-scale graph data. Third, besides the optimiza-
tion of training, we aim to boost the inference tasks over
graphs because labeled data are very limited (say ten mil-
lion) in practice compared with unlabeled data, typically
billions of nodes, to be inferred.
The principle of designing AGL is based on the message
passing scheme underlying the computations of GNNs. That
is, we first merge all the informations from each node’s in-
edge neighbors, and then propagate those merged informa-
tions to the destination nodes via out-edges. We repeatedly
apply such a principle to the training and inference pro-
cesses, and develop GraphFlat and GraphInfer. Basically,
GraphFlat is to generate independent K-hop neighborhoods
3