分布式内存系统中亿级图的分区策略

173 浏览量更新于2024-08-25 收藏 245KB PDF 举报

"这篇论文探讨了如何在分布式内存系统上对包含十亿节点的大型图进行分区，这对于实现在线查询处理和离线图分析的通用平台至关重要。论文提出了一种多级标签传播（MLP）方法来解决这个问题，并通过实验展示了该方法在几个小时内在仅由几台机器组成的分布式内存系统上就能完成十亿节点图的分区，并且能有效地平衡负载和减少通信开销。" 正文：在计算机科学领域，尤其是在大数据和图计算的范畴内，处理包含十亿甚至更多节点的大型图是一个巨大的挑战。这些大型图不仅对存储基础设施造成压力，还对编程模型提出了新的要求。由于现实世界中的许多复杂问题可以抽象为图结构，例如社交网络、互联网、交通网络等，因此开发一个通用的图处理平台对于数据分析和挖掘至关重要。分布式内存系统被看作是解决这一问题的理想平台，因为它能够支持在线查询处理（实时响应用户请求）以及离线图分析（批量处理复杂的图算法）。然而，如何有效地在这样的平台上对大规模图进行分区，是直接影响系统性能和效率的关键问题。图分区的目标是将图的节点和边分配到多个处理单元（如服务器或计算节点），以达到负载均衡，减少通信开销，从而提高整体处理速度。本论文提出的多级标签传播（MLP）方法，是针对这个问题的一种创新解决方案。传统的图分区算法往往假设数据可以任意组织以优化算法性能，但在处理大规模图时，这种方法不再适用。MLP方法则考虑了系统的数据和编程模型，使得算法与系统其他应用保持一致，从而更好地适应分布式环境。实验结果证明，MLP方法在实际应用中表现出色，能够在数小时内完成十亿节点图的分区工作，而且只需要少数机器。这表明，尽管图的规模巨大，但通过精心设计的算法，仍能在相对有限的硬件资源下实现高效处理。此外，这种方法还能显著降低通信开销，这是分布式计算中一个非常重要的指标，因为通信往往是性能瓶颈所在。这篇论文为处理大规模图提供了一个实用的工具，不仅对于图处理平台的设计者，也对于需要处理大规模图数据的开发者和研究人员具有重要参考价值。通过采用MLP方法，他们可以在分布式系统中更有效地管理和分析大规模图，从而推动图计算技术的发展和应用。

the MLP algorithm. Section 5 presents a disk-based implementa-

tion of MLP. Section 6 presents experiment results, and Section 7

reviews related works. We conclude in Section 8.

2. BACKGROUND

In this section, we ﬁrst introduce the Trinity infrastructure, which

is used as a general-purpose computation platform for web scale

graphs. Then, we introduce two techniques related to our approach

for graph partitioning: graph coarsening and label propagation.

2.1 The Trinity Graph System

We use Trinity [17] as the infrastructure for handling web-scale

graphs. Trinity is essentially a memory cloud created out of the

RAM of multiple machines, and it offers a uniﬁed memory space

for user programs. Most graph applications need efﬁcient random

data accesses on graphs, and Trinity’s efﬁcient in-memory graph

exploration and bulk message passing mechanisms answered this

need and enable it to handle large graphs.

Trinity supports very efﬁcient memory-based graph exploration.

In one experiment, we deployed a synthetic, power-law graph in a

15-machine cluster managed by Trinity. The graph has Facebook-

like size and distribution (800 millions nodes, 100 billion edges,

with each node having on average 130 edges). We found that ex-

ploring the entire 3-hop neighborhood of any node in the graph

takes less than 100 milliseconds on average. In other words, Trin-

ity is able to explore 130 + 130

+ 130

≈ 2.2 million edges in

one tenth of a second.

Making the graph topology memory resident makes fast random

graph access possible. On the other hand, some computation al-

lows us to predict the access pattern on the graph. In this case,

we can store the entire graph on the disk and schedule parts of the

graph to be memory resident when they are needed for computa-

tion. This enables Trinity to handle extremely large graphs using

a small number of machines, and enables small organizations that

cannot afford a large memory cloud to perform large-scale compu-

tations on graphs. In this paper, we propose a graph partitioning

algorithm that allows us to predict the access pattern. Thus, we can

partition billion-node graphs even if the memory cloud is not big

enough to hold the entire graph. Our experiments show that we

can partition billion-node graphs with eight machines that each has

48G memory.

Trinity also provides an efﬁcient bulk message passing mecha-

nism. Using this mechanism, we can build an ofﬂine computation

platform for web-scale graph analytics on Trinity. For instance, we

can implement the Pregel-like [15] Bulk Synchronous Parallel (B-

SP) computation model. In this model, the programmer writes a

vertex-based algorithm, and the system takes care of its parallel ex-

ecution on all vertices. Trinity’s bulk message passing mechanism

allows for a high performance by BSP. In one experiment, using

just 8 machines, one BSP iteration on a synthetic, power-law graph

of 1 billion nodes and 13 billion edges takes less than 60 seconds.

The efﬁcient graph exploration and bulk message passing mech-

anism of Trinity lays the foundation for developing our graph par-

titioning algorithm. Still, there are many challenges to devising

graph partitioning algorithms for vertex-based computation. In this

paper, we introduce a novel label propagation based algorithm for

graph partitioning.

2.2 Graph Coarsening

Graph partitioning algorithms such as KL [5] and FM [6] are ef-

fective for small graphs. For a large graph, a widely adopted ap-

proach is to “coarsen” the graph until its size is small enough for

KL or FM. The idea is known as multi-level graph partitioning, and

a representative approach is METIS [8].

METIS works in three steps: (1) coarsening the graph; (2) par-

titioning the coarsened graph; (3) uncoarsening. In the 1st step,

METIS coarsens a graph by ﬁnding the maximal match. A maxi-

mal match is a maximal set of edges where no two edges share a

common vertex. After it ﬁnds a maximal match, it collapses the

two ends of each edge into one node, and as a result, the graph is

“coarsened.” The coarsening step repeats until the graph is small

enough. Then, in the 2nd step, it applies KL or FM directly on the

small graph. In the third step, the partitions on the small graph are

projected back to the ﬁner graphs.

Before we discuss potential problems of coarsening for real life

graphs, we ﬁrst look at an example:

EXAMPLE 1 (MAXIMAL MATCH). For the graph shown in Fig-

ure 1(a), the following edge set is a maximal match:

{(c, f ), (e, g), (h, i), (k, l), (j, b), (a, d)}

Figure 1(b) is the result of coarsening (obtained after collapsing

the two ends of each edge in the maximal match).

The correctness of METIS is based on the following assumption:

A (near) optimal partitioning on a coarser graph implies a good

partitioning in the ﬁner graph. However, in general, the assumption

only holds true when the degree of nodes in the graph is bounded

by a constant [9]. For example, 2D or 3D meshes are graphs where

node degrees are bounded. However, for today’s real life graphs,

the assumption does not hold any more. It is well established that

the degree distribution of real life networks are right-skewed, and

there are many hub vertices with very large degrees. In other word-

s, the degree is not bounded by a small constant, but is related to the

size of the graph. As a result, a maximal match may fail to serve as

a good coarsening scheme in graph partitioning. For example, the

coarsened graph in Figure 1(b) no longer contains the clear struc-

ture of the original graph. Thus, partitions on the coarsened graph

cannot be optimal for the original graph.

Furthermore, the process of coarsening by maximal match is inef-

ﬁcient for billion-node graphs. Two maximal match strategies are

used in various versions of METIS: Random matching (RM) and

Heavy Edge Matching (HEM). In RM, the vertices are visited in

a random order. If a vertex u has not been matched yet, then one

of its unmatched neighbors will be randomly selected and matched

with u. HEM is similar to RM, except that it selects the unmatched

neighbor v if edge (u, v) has the largest weight. As we can see, in

the above mentioned approaches, vertices are matched in a random

order. For disk resident graphs, random access leads to bad perfor-

mance. In a multi-level framework, graphs generated at each level

and the mappings between them are stored in memory. These inter-

mediate results can be very large. For example, for LiveJournal

a real social network that contains more than four million vertices,

METIS (using either RM or HEM) will consume more than 10G of

memory. The heavy usage of memory makes the approach unfea-

sible for billion-node graphs.

2.3 Label Propagation

We propose a method for large scale graph partitioning based on

the idea of label propagation (LP), which was originally proposed

for community detection in social networks. A naive LP runs as

follows. We ﬁrst assign a unique label id to each vertex. Then,

we update the vertex label iteratively. In each iteration, a vertex

takes the label that is prevalent in its neighborhood as its own label.

The process terminates when labels no longer change. Vertices that

have the same label belong to the same partition.

There are two reasons we adopt label propagation for partition-

ing. First, the label propagation mechanism is lightweight. It does

not generate big intermediary results, and it does not require sort-

ing or indexing the data as in many current graph partitioning algo-

rithms. This makes label propagation feasible for web scale graphs

deployed on Trinity. With Trinity’s efﬁcient graph exploration and

http://snap.stanford.edu/data/soc-LiveJournal1.html

剩余11页未读，继续阅读

weixin_38681719

粉丝: 7
资源: 930

分布式内存系统中亿级图的分区策略

How to partition a billion-Node graph

nodejs kafka-node 消费消息，生产消息（csdn）————程序.pdf

以上述图分割代码为逻辑，创建一个例子，并给出代码和结果

vmstat [-p disk partition] [-n] [delay [ count]]

How to optimize queries in Hive? How to create a partition table with Hive?

iceberg metadata文件里面partition-spec和partition-specs区别

Error deserializing key/value for partition dolphinscheduler-datalineage-task-param-0 at offset 0. If needed, please seek past the record to continue consumption.

_pickle.UnpicklingError: Failed to interpret file 'partition\\CASIA-B_73_False.npy' as a pickle

sql错误【10096】【42000】：Error while compiling statement: FAILED:SemanticException [Error 10096]:Dynamic partition strict mode requires at least one static partition column.To turn this off set hive.exec.dynamic.partition.mode-nonstrict

hudi表如何新增分区

最新资源