优化随机超图剥除算法的缓存无关技术

41 浏览量更新于2024-08-25 收藏 777KB PDF 举报

"这篇论文是关于在随机超图中实现缓存无感知的剥离算法，主要作者包括Djamal Belazzougui、Paolo Boldi、Giuseppe Ottaviano、Rossano Venturini和Sebastiano Vigna，分别来自赫尔辛基大学、米兰大学、意大利国家研究委员会ISTI-CNR和比萨大学的计算机科学系。论文讨论了如何在随机生成的超图中计算剥离顺序，这个过程是构建完美散列方案、随机r-SAT求解器、纠错编码和近似集合编码等许多构造中的关键步骤。" 在随机超图中计算剥离顺序是一个时间消耗大的任务，尤其是在超图的大小超过可用内部内存时，传统的线性时间算法由于其糟糕的I/O性能变得不切实际。论文提出了一种新的方法，将计算剥离顺序的过程减少到一系列的顺序扫描和排序操作。在这个过程中，他们关注的是在缓存无感知模型中的I/O复杂性分析。缓存无感知模型是一种计算模型，其中算法的设计不需要知道特定级别的缓存大小或层次结构，而是目标于优化通用硬件的性能。在这样的模型下，算法的目标是减少磁盘到主存的数据传输，因为这是计算密集型任务中的性能瓶颈。该论文提出的算法实现了O(sort(n))的I/O操作复杂度和O(n log n)的时间复杂度来完成对随机超图的剥离。这意味着算法的主要性能开销在于排序操作，而总体I/O操作次数与排序输入的规模n成正比。这是一个显著的改进，因为它不仅优化了时间效率，还考虑到了外部存储器的访问效率，使得大尺寸超图的处理成为可能。在超图的剥离过程中，节点按照某种顺序被“剥离”出去，这个顺序的选择会影响整个计算过程的效率。在随机生成的超图中，节点之间的连接往往是无规则的，因此找到一个高效的剥离顺序至关重要。通过使用缓存无感知算法，即使在内存限制条件下，也能有效地处理这些大型数据结构，提高算法的实用性。这篇论文为解决大规模超图处理中的I/O效率问题提供了新的视角，对于处理大数据集和设计高效算法在现实世界的应用有着重要的理论与实践意义。

2 Notation and tools

Model and assumptions

We analyze our algorithms in the cache-oblivious model [

]. In this model,

the machine has a two-level memory hierarchy, where the fast level has an unknown size of

words and

a slow level of unbounded size where our data reside. We assume that the fast level plays the role of a

cache for the slow level with an optimal replacement strategy where the transfers (a.k.a. I/Os) between

the two levels are done in blocks of an unknown size of

B ≤ M

words; the I/O cost of an algorithm is

the total number of such block transfers. Scanning and sorting are two fundamental building blocks in

the design of cache-oblivious algorithms [

]: under the tall-cache assumption [

], given an array of

contiguous items the I/Os required for scanning and sorting are

scan(N) = O



1 +



I/Os and sort(N) = O



log

M/B



Hypergraphs

-hypergraph on a vertex set

is a subset





, the set of subsets of

cardinality

. An element of

is called an edge. We call an ordered

-tuple from

an oriented edge; if

is an edge, an oriented edge whose vertices are those in

is called an orientation of

. From now on we

will focus on 3-hypergraphs; generalization to arbitrary

is straightforward. We deﬁne valid orientations

those oriented edges (

, v

) where

< v

(for arbitrary

< ··· < v

r−1

). Then for each edge there

are 6 orientations, but only 3 valid orientations (r! orientations of which r are valid).

We say that a valid oriented edge (

, v

) is the

-th orientation if

is the

-th smallest among the

three; in particular, the 0-th orientation is the canonical orientation. Edges correspond bijectively with

their canonical orientations. Furthermore, valid orientations can be mapped bijectively to pairs (

e, v

) where

is an edge and

a vertex contained in

, simply by the correspondence (

, v

)

7→

(

, v

}, v

In the following all the orientations are assumed to be valid, so we will use the term orientation to mean

valid orientation.

3 The Majewski–Wormald–Havas–Czech technique

Majewski et al. [

] proposed a technique (MWHC) to compute an order-preserving minimal perfect hash

function, that is, a function mapping a set of keys

in some speciﬁed way into [

|S|

]. The technique

actually makes it possible to store succinctly any function

S →

[

], for arbitrary

. In this section we

brieﬂy describe their construction.

First, we choose three random

hash functions

, h

S →

[

γn

] and generate a 3-hypergraph

with

γn

vertices, where

is a constant above the critical threshold

[

], by mapping each key

the edge

(

)

, h

(

)

, h

(

)

}

. The goal is to ﬁnd an array

γn

integers in [

] such that for each key

one has

(

) =

(x)

mod σ

. This yields a linear system with

equations and

γn

variables

; if the associated hypergraph is peelable, it is easy to solve the system. Since

is larger than

the critical threshold, the algorithm succeeds with probability 1 −o(1) as n → ∞ [25].

By storing such values

, each requiring

dlog σe

bits, plus the three hash functions, we will be able to

recover

(

). Overall, the space required will be

dlog σeγn

bits, which can be reduced to

dlog σen

γn

(

)

using a ranking structure [

]. This technique can be easily extended to construct MPHFs: we deﬁne the

function

f : S →

[3] as

x 7→ i

where

(

) is a degree-1 vertex when the edge corresponding to

is peeled;

it is then easy to see that

f(x)

(

) :

S →

[

γn

] is a PHF. The function can be again made minimal by

adding a ranking structure on the vector u [6].

As noted in the introduction, the peeling procedure needed to solve the linear system can be performed

in linear time using a greedy algorithm (referred to as standard linear-time peeling). However, this

procedure requires random access to several integers per key, needed for bookkeeping; moreover, since

the graph is random, the visit order is close to random. As a consequence, if the key set is so large that

it is necessary to spill to the disk part of the working data structures, the I/O volume slows down the

algorithm to unacceptable rates.

Like most MWHC implementations, in our experiments we use a Jenkins hash function with a 64-bit seed in place of a

fully random hash function.

Although the technique works for r-hypergraphs, r = 3 provides the lowest space usage [25].

剩余10页未读，继续阅读

weixin_38608873

粉丝: 6
资源: 980

优化随机超图剥除算法的缓存无关技术

Cache-Oblivious Streaming B-trees-计算机科学

Cache-Oblivious-Algorithms:缓存遗忘算法

Cache-Oblivious Algorithms and Data Structures (Demaine, 2002)-计算机科学

Enhancing Server Availability and Security Through Failure-Oblivious Computing - 2004 (rinard)-计算机科学

基于向量引用Platform-Oblivious内存连接优化技术.pdf

mc-oblivious:原型

不经意传输 OT Efficient 1-Out-of-n Oblivious Transfer Schemes论文实现

Practical quantum all-or-nothing oblivious transfer protocol

Multi-Cloud Oblivious Storage

1-out-of-n:JavaScript中n分之一的遗忘传输协议

最新资源

不经意传输 OT　Efficient 1-Out-of-n Oblivious Transfer Schemes论文实现