AGL：蚂蚁金服打造的工业级大规模图机器学习系统

需积分: 50 117 浏览量更新于2024-09-03 收藏 1.37MB PDF 举报

"AGL是一个由蚂蚁金服人工智能部开发的可扩展工业图机器学习系统，旨在解决大规模图数据处理的问题。该系统支持完整的GNNs（图神经网络）训练和推理，能处理包含十亿节点和千亿边的复杂图数据。论文详细探讨了在工业界应用图机器学习面临的挑战，并提出了一种新型解决方案。" AGL系统的设计目标是克服传统图机器学习系统在处理大规模数据时的局限性。现有的系统通常将图数据存储在内存中，以便快速访问，这可能是单台机器或远程图存储。然而，这种做法存在三个主要问题： 1. **扩展性有限**：由于内存容量限制，现有系统无法处理超大规模的图数据，或者在图存储和工作节点之间带宽不足，限制了系统的可扩展性。 2. **额外开发需求**：对于图的特有操作，如邻接矩阵构建、邻居采样等，需要专门的开发工作，增加了系统的复杂性和维护难度。 3. **效率低下**：传统的参数服务器模型在处理图数据时效率不高，因为它假设数据可以并行处理，但图数据的内在依赖性使得这种假设不成立。 AGL系统针对这些问题提出了创新解决方案，它可能采用了分布式存储和计算策略，允许在多台机器间高效地分布和处理图数据。此外，AGL可能还优化了图神经网络的训练和推理过程，例如通过高效的采样策略减少计算复杂性，以及优化通信协议提高数据传输效率。图机器学习在处理非结构化数据，如社交网络、交易网络、知识图谱等场景下具有显著优势。AGL的出现，对于工业界来说，意味着能够更有效地利用这些数据进行分析和预测，比如欺诈检测、推荐系统、网络优化等应用。它的高性能和可扩展性，使得在实际业务中处理大规模图数据成为可能，为工业界带来了新的机遇和挑战。《AGL：可扩展工业图机器学习系统》这篇论文揭示了在工业应用中实现图机器学习系统的关键技术和挑战，并展示了AGL系统如何克服这些挑战，提供了一个强大的工具来解决实际世界中的复杂问题。

represents the weight of a directed edge from node u to node

v (i.e., (v, u) ∈ E), and A

v,u

= 0 represents there is no edge

(i.e., (v, u) /∈ E). X ∈ R

|V|×f

is a matrix consisting of all

nodes’ f

-dimensional feature vectors, and E ∈ R

|V|×|V|×f

is a sparse tensor consisting of all edges’ f

-dimensional fea-

ture vectors. Speciﬁcally, x

denotes the feature vector of

v, e

v,u

denotes the feature vector of edge (v, u) if (v, u) ∈ E,

otherwise e

v,u

= 0. In our setting, an undirected graph

is treated as a special directed graph, in which each undi-

rected edge (v, u) is decomposed as two directed edges with

the same edge feature, i.e., (v, u) and (u, v). Moreover, we

use N

to denote the set of nodes directly pointing at v,

i.e., N

= {u : A

v,u

> 0}, N

−

to denote the set of nodes

directly pointed by v, i.e., N

−

= {u : A

u,v

> 0}, and

= N

∪ N

−

. In other words, N

denotes the set of in-

edge neighbors of v, while N

−

denotes the set of out-edge

neighbors of v. We call the edges pointing at a certain node

as its in-edges, while the edges pointed by this node as its

out-edges.

2.2 Graph Neural Networks

Most GML models aim to encode a graph structure (e.g.,

node, edge, subgraph or the entire graph) as a low dimen-

sional embedding, which is used as the input of the down-

stream machine learning tasks, in an end-to-end or decou-

pled manner. The proposed AGL mainly focuses on GNNs,

which is a category of GML models widely-used. Each layer

of GNNs generates the intermediate embedding by aggregat-

ing the information of target node’s in-edge neighbors. After

stacking several GNN layers, we obtain the ﬁnal embedding,

which integrate the entire receptive ﬁeld of the targeted

node. Speciﬁcally, we denote the computation paradigm of

the k

GNN layer as follows:

(k+1)

= φ

(k)

({h

(k)

}

i∈{v}∪N

, {e

v,u

}

v,u

; W

(k)

), (1)

where h

(k)

denotes node v’s intermediate embedding in the

layer and h

(0)

= x

. The function φ

(k)

parameterized by

(k)

, takes the embeddings of v and its in-edge neighbors

, as well as the edge features associated with v’s in-edges

as inputs, and outputs the embedding for the next GNN

layer.

The above computations of GNNs can be formulated in

the message passing paradigm. That is, we collect keys (i.e.,

node ids) and their values (i.e., embeddings). We ﬁrst merge

all the values from each node’s in-edge neighbors so as to

have the new values for the nodes. After that, we propagate

the new values to destination nodes via out-edges. After

K times of such merging and propagation, we complete the

computation of GNNs. We will discuss in the following sec-

tions that such a paradigm will be generalized to the training

and inference of GNNs.

2.3 K-hop Neighborhood

Deﬁnition 1. k-hop neighborhood. The k-hop neighbor-

hood w.r.t. a targeted node v, denoted as G

, is deﬁned

as the induced attributed subgraph of G whose node set is

= {v} ∪ {u : d(v, u) ≤ k}, where d(v, u) denotes the

length of the shortest path from u to v. Its edge set con-

sists of the edges in E that have both endpoints in its node

set, i.e. E

= {(u, u

) : (u, u

) ∈ E ∧ u ∈ V

∧ u

∈ V

Moreover, it contains the feature vectors of the nodes and

edges in the k-hop neighborhood, X

and E

. Without loss

of generality, we deﬁne the 0-hop neighborhood w.r.t. v as

the node v itself.

The following theorem shows the connection between the

computation of GNNs and the k-hop neighborhood.

Theorem 1. Let G

be the k-hop neighborhood w.r.t. the

target node v, then G

contains the suﬃcient and neces-

sary information for a k-layer GNN model, which follows

the paradigm of Equation 1, to generate the embedding of

node v.

First, the 0

layer embedding is directly assigned by the

raw feature, i.e., h

(0)

= x

, which is also the 0-hop neighbor-

hood. And then, from Equation 1, it’s easy to ﬁnd that the

output embedding of v in each subsequent layer is generated

only based on the embedding of the 1-hop in-edge neighbors

w.r.t. v from the previous layer. Therefore, by applying

mathematical induction, it’s easy to prove the Theorem 1.

Moreover, we can extend the theorem to a batch of nodes,

that is the intersection of the k-hop neighborhoods w.r.t. a

batch of nodes provides the suﬃcient and necessary infor-

mation for a k-layer GNN model to generate the embedding

of all nodes in the batch. This simple theorem implies that

in a k-layer GNN model the target node’s embedding at the

layer only depends on its k-hop neighborhood, rather

than the entire graph.

3. SYSTEM

In this section, we ﬁrst give an overview of our AGL sys-

tem. Then, we elaborate three core modules, i.e., Graph-

Flat, GraphTrainer and GraphInfer. At last, we give a demo

example on how to implement a simple GCN model with the

proposed AGL system.

3.1 System Overview

Our major motivation of building AGL is that the indus-

trial communities desiderate an integrated system of fully-

functional training/inference over graph data, with scalabil-

ity, and in the meanwhile has the properties of fault tolerance

based on mature industrial infrastructures like MapReduce,

parameter servers, etc. That is, instead of requiring a sin-

gle monster machine or customized graph stores with huge

memory and high bandwidth networks, which could be ex-

pensive for Internet companies to upgrade their infrastruc-

tures, we sought to give a solution based on mature and

classic infrastructures, which is ease-to-deploy while enjoy-

ing various properties like fault tolerance and so on. Second,

we need the solution based on mature infrastructures scale

to industrial-scale graph data. Third, besides the optimiza-

tion of training, we aim to boost the inference tasks over

graphs because labeled data are very limited (say ten mil-

lion) in practice compared with unlabeled data, typically

billions of nodes, to be inferred.

The principle of designing AGL is based on the message

passing scheme underlying the computations of GNNs. That

is, we ﬁrst merge all the informations from each node’s in-

edge neighbors, and then propagate those merged informa-

tions to the destination nodes via out-edges. We repeatedly

apply such a principle to the training and inference pro-

cesses, and develop GraphFlat and GraphInfer. Basically,

GraphFlat is to generate independent K-hop neighborhoods

剩余11页未读，继续阅读

syp_net

粉丝: 158

AGL：蚂蚁金服打造的工业级大规模图机器学习系统

人工智能蚂蚁

Java_蚂蚁图学习AGL为工业规模的图学习任务提供了全面的解决方案.zip

AGL1200002 工业水气管图 (中建机).pdf

vue高德地图agl : "133" gtm : "20250214/164637" hgt : "3196" lat : "16538791" lon : "59884870" mlg : "14920.8" spd : "41.4"这几个字段代表什么

agl demo app实现播放

Linux车载系统版本

matlab agl函数

当前汽车软件比较主流的框架有哪些，请详细说明，并且进行对比

AGL250V5-FGG144I 市场销售好吗，有没有缺点，主要应用在哪些方面，市场价格是多少，制成多少

最新资源