HEPart：大数据应用程序的超图分区算法

Hypergraph

Load

146 浏览量更新于2024-07-15 1 收藏 3.52MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

"HEPart是一种针对大数据应用程序的平衡超图分区算法，旨在优化多主机之间的查询成本，实现数据处理的高效水平扩展。该方法基于超图理论，尤其关注超边的划分，以降低组操作产生的查询成本，同时确保工作负载的平衡。HEPart通过有效的超边移动策略直接将超图划分为K个子超图，最小化顶点切割尺寸，并满足超边权重的平衡约束。在一系列无向超图建模的复杂网络数据集上进行了评估，与传统顶点分割算法对比，HEPart表现出优越的性能，特别是在无标度网络中，能显著降低成本并保持良好的负载平衡。" 在大数据处理领域，数据的复杂性和规模经常超出单个主机的处理能力，这使得超图分区（Hypergraph Partitioning，HP）成为一种重要的技术。超图是一种更强大的图模型，它允许边连接任意数量的顶点，因此特别适合表示复杂网络中的多路关系和交互。传统的HP算法主要关注顶点分割，目标是减少超边被切割的数量，同时保持各部分顶点的权重平衡。然而，HEPart算法认为，考虑到大数据应用中的查询工作负载通常由组操作驱动，降低超边（即组操作的代表）上的查询成本并均衡工作负载更为关键。 HEPart算法的核心是超边分割，它引入了新的衡量标准——两个切割尺寸指标，以评估超边分割的效果。算法通过直接的K-way超图分区，基于超边移动策略来寻找最佳的分割方案。此外，为了加速更新过程，HEPart提出了顶点关键性（Vertex Criticality）的概念，这一概念有助于快速计算超边移动带来的增益，从而提高算法效率。实验结果显示，HEPart在各种场景下都能有效地降低查询成本，尤其是在无标度网络中，其优势更加明显。与现有的超图分割器和顶点分割算法相比，HEPart能更好地维持负载平衡，这对于大数据应用程序的性能至关重要。HEPart为大数据处理提供了一种优化的解决方案，不仅减少了通信开销，还提升了系统的整体性能。

资源详情

资源推荐

W. Yang et al. / Future Generation Computer Systems 83 (2018) 250–268 253

Table 1

Main notations used in the paper.

Symbol Description Symbol Description

H Hypergraph (n

, v

) A pin

V Vertex set Π A partition

N Hyperedge set ω(v

) Weight of vertex v

K Total number of partitions ω(n

) Weight of hyperedge n

A vertex in the vertex set N

A subset of hyperedge set

A hyperedge in the hyperedge set W (N

) Weight of a hyperedge subset

Vertices(n

) Vertex set connected by net n

c(v

) Cost value of vertex v

Nets(v

) Hyperedge set connects vertex v

Λ(v

) Cardinality of partitions

) Number of nets connecting where v

locates

vertex v

in part p C(H, Π) Cutsize of hypergraph H

) Move gain if n

was to be moved under partition Π

from part N

to part N

State(n

) The part that n

locates

connects a subset of vertices. The set of nets that connect vertex v

is represented by Nets(v

). The vertices set connected by net n

denoted as Vertices(n

). A (n

, v

) tuple denotes a pin of n

where

∈ Vertices(n

). The nets n

and n

are said to be neighbors if

they have at least one common pin, i.e., Vertices(n

) ∩ Vertices(n

)

= ∅. The vertices v

and v

are neighbors of each other if they

are connected by at least one common net, i.e., Nets(v

) ∩ Nets(v

)

= ∅. The degree of a net n

is equivalent to the number of ver-

tices it connects,



Vertices(n

)



. The total number of pins |P| =



∈N



Vertices(n

)



denotes the size of a given hypergraph H.

A vertex weight value ω(v

) is related to each vertex v

, and a

hyperedge weight value ω(n

) is the total weight of the connected

vertices of net n

, i.e. ω(n

) =



∈Vertices(n

)

ω(v

Π =

{

, . . . , N

}

is a K -way hyperedge partition of H =

(

V, N

)

if each part N

is a nonempty subset of N , the parts are pair

wise disjoint, and the union of K parts is equivalent to N . The set

of vertices located in partition N

is represented by Vertices(N

and the set of nets that located in partition N

is denoted as

Nets(N

). The weight W(N

) of a part N

is the total weights of

the hyperedges in that partition, i.e., W (N

) =



nj∈N

w(n

). A

partition Π is regarded as balanced if each part N

∈ Π satisfies

the balance constraint:

W (N

) ≤ (1 + ϵ)W

avg

for k = 1, . . . , K , (1)

where W

avg

= W (N )/K and ϵ is the predetermined maximum

imbalance ratio.

In a partition Π, a vertex is said to be located in a part if it

is connected by at least one hyperedge in that part. The locality

set Λ(v

) of vertex v

is defined as the set of parts where v

located. The cardinality of locality set Λ(v

) of vertex v

is denoted

by λ(v

) =

Λ(v

)

, or λ

for short. This is equivalent to the number

of required copies for vertices v

. A vertex is cut or replicated if it

locates in more than one part (λ(v

) > 1), and uncut or single if it

locates in only one part (λ(v

) = 1). The set of vertices in cut state

within a partition Π is denoted as V

. Since the replicated vertices

are the channels through which the partitions communicate, the

communication cost is mainly related with these vertices. A cost

value c(v

) is related to every vertex v

, and the cost function for a

vertex set is denoted by c(V ) =



∈V

(

)

. We define two cutsize

metrics to measure the cost of a partition Π of hypergraph H. The

cutsize can take the form of (2) or (3).

C(H, Π ) =



∈V

c(v

) (2)

C(H, Π ) =



∈V

(λ(v

) − 1)c(v

). (3)

We define the cost definitions in (2) and (3) to be the cut-

vertex metric and the replica metric, respectively. Cutsize under

the cut-vertex metric calculates the total number of vertices which

are replicated, thus in cut state. For example, assuming that every

vertex weight value is assigned a unit value, the cutsize under cut-

vertex metric of Fig. 1 is 3, since the vertices v

, v

and v

are cut.

While cutsize under the replica metric counts the total number of

slave copies, a.k.a replicas. This represents the difference between

the number of total copies of vertices (that have been replicated)

and the number of vertices in cut state. For example, assuming

that every vertex weight value is assigned a unit value, the cutsize

under replica metric of Fig. 1 is 5, since the number of replicas of

vertices v

, v

and v

are 3, 1 and 1, respectively.

Now, we formulate the optimization problem as follows: Find

the optimal partitioning Π

⋆

such that:

⋆

= arg min

C(H, Π )

s.t. W (N

) ≤ (1 + ϵ)W

avg

for k = 1, . . . , K .

(4)

In other words, given a hypergraph H =

(

V, N

)

, balanced K -

way hyperedge partitioning can be defined as finding a K -way

partition Π =

{

, . . . , N

}

that aiming at minimizing the cutsize

(2) or (3) as the primary objective and maintaining the balance

constraint (1) as the secondary objective.

4. Proposed HEPart

In this section, we will present our proposed heuristic central-

ized hyperedge partitioning algorithm, namely HEPart.

4.1. Definition

In a K-way hyperedge partitioning Π =

{

, . . . , N

}

, a net

only belongs to one part, and a vertex can belong to more than

one part, if it is cut, and then replicated. An instance of a replicated

vertex is termed as replica. No more than one instance of a vertex

can belong to the same part. A vertex connects a set of nets, and the

number of nets connecting vertex v

in part p is denoted as σ

According to the definitions, |Nets(v

)| = σ

) + σ

) + · · · +

We use the cut-state of a vertex to describe whether that vertex

is cut or not. A vertex v

in a K-way hyperedge partitioning is said

to be cut or exterior or replicated if there are two or more parts with

at least one hyperedge connecting this vertex, i.e. σ

) > 0 and

) > 0 (1 ≤ p ≤ K , 1 ≤ q ≤ K , p = q). A vertex v

is said

to be uncut or interior to part N

if only part N

owns at least one

hyperedge connecting this vertex, i.e. σ

) > 0 and σ

) = 0

(1 ≤ p ≤ K , 1 ≤ q ≤ K , p = q).

HEPart attempts to improve the cutsize of a given K -way

partition in an iterative way, by manipulating the move operations

performed on the nets. Every net owns a move gain when it is

moved from one part to another part. The move gain is defined as

follows.

The move gain g

) of a net n

refers to the reduction in the

cutsize if n

was to be moved from part N

to part N

(1 ≤ p ≤

K , 1 ≤ q ≤ K , p = q). Regarding the cut-vertex cutsize metric, the

剩余18页未读，继续阅读

weixin_38722588

粉丝: 6
资源: 839

HEPart：大数据应用程序的超图分区算法

kahypar：KaHyPar（Karlsruhe超图分区）是一个多级超图分区框架，提供了直接的基于k途和递归二等分的分区算法，可计算出高质量的解决方案

Fiducci-Mathiasis:超图分割的Fiducci-Mathiasis算法的实现

多路超图分区算法：FMS_c语言_代码_下载

超图10 i 白皮书 下载

基于的超图最小顶点覆盖算法及分析

Python 实现超图

超图 与cesium的区别

超图划分kahypar

超图怎么加载3d地图

超图isever切片

写一个超图公共物品演化博弈的matlab 程序

自监督多通道超图卷积网络

超图加载s3m图层地表不平

超图神经网络和图神经网络的区别

对比下超图supmap webgl三维地球产品与火星mars3d三维地图的差异

超图supermap

超图神经网络代码实现

超图卷积网络 时间复杂度

超图研究院bim+gis技术白皮书

最新资源

超图10 i 白皮书下载

超图与cesium的区别

超图卷积网络时间复杂度