FERRARI:灵活高效的图可达范围分配索引

18 浏览量更新于2024-08-25 收藏 321KB PDF 举报

FERRARI是一项针对图索引的高效可达范围分配方法，由Stephan Seufert、Avishek Anand、Srikanta Bedathur和Gerhard Weikum共同提出，他们分别来自德国的Max Planck Institute for Informatics和印度的IIIT Delhi。论文的核心在于设计一种可扩展且性能卓越的索引结构，专门解决图上的可达性问题。传统的节点区间标记方案将从某个特定节点可达的所有顶点集合紧凑地表示为一系列节点标识范围。FERRARI在此基础上进行创新，它引入了一个明确的索引大小限制，并灵活地为图中的节点分配近似的可达范围，目标是尽可能减少回答查询所需的索引操作数量。这种可调参数的设计使得在空间预算增加时，索引结构能够生成更优的范围标签分配，从而直接控制索引大小与查询处理性能之间的权衡。作者采用了一种快速递归查询方法与FERRARI的索引结构结合，实验证明在实际应用中，可达性查询能够达到接近线性的响应时间。这意味着随着空间资源的增加，查询效率得到提升，同时用户可以根据需求调整参数，以适应不同场景下的性能要求。FERRARI提供了一种灵活且高效的解决方案，适用于大规模图数据的可达性查询处理，有助于提升图形数据库系统的整体效能。

resulting identiﬁer locality: For every (complete) subtree of T ,

the ordered identiﬁers of the included nodes form a contiguous

sequence of integers. The vertex set of any such subtree can

thus be compactly expressed as an integer interval. Let T

, E

) denote the subtree of T rooted at node v. We have



π(w)



w ∈ V





min

w∈V

π(w), max

w∈V

π(w)



(6)



min

w∈V

π(w), π(v)



Above interval is called tree interval of v and will be denoted

by I

(v) in the remainder of the text.

The complete reachability information of the spanning tree

T is encoded in the collection of tree intervals. For a pair of

nodes u, v ∈ V , there exists a path from u to v in T iff the

post-order number of the target is contained in the tree interval

of the source, that is,

u ∼

v ⇐⇒ π(v) ∈ I

(u). (7)

This reachability index for trees allows for O(1) query pro-

cessing at a space consumption of O(n).

Extension to DAGs. While above technique can be used

to easily answer reachability queries on trees, the case of

general DAGs is much more challenging. The reason is that,

in general, the reachable set R(v) of a vertex v in the DAG

is only partly represented by the interval I

(v), as the tree

interval only accounts for reachability relationships that are

preserved in T . Vertices that can only be reached from a node

v by traversing one or more non-tree edges have to be handled

seperately: instead of merely storing the tree intervals I

(v),

every node v is now assigned a set of intervals, denoted by

I(v). The purpose of this so-called reachable interval set is

to capture the complete reachability information of a node.

The sets I(v), v ∈ V are initialized to contain only the

tree interval I

(v). Then, the vertices are visited in reverse

topological order. For the current vertex v and every outgoing

edge (v, w) ∈ E, the reachable interval set I(w) is merged

into the set I(v). The merge operation on the intervals resolves

all cases of interval subsumption and extension exhaustively,

eventually ensuring interval disjointness. Due to the fact that

the vertices are visited in reverse topological order, it is

ensured that for every non-tree edge (s, t) ∈ E \ E

, the

reachability intervals in I(t) will be propagated and merged

into the reachable interval sets of s and all its predecessors.

As a result, all reachability relationships are covered by the

resulting intervals.

Query Processing. Using the reachable interval sets I(v),

queries on DAGs can be answered by checking whether the

post-order number of the target is contained in one of the

intervals associated with the source:

u ∼ v ⇐⇒ ∃



[α, β] ∈ I(u)



: α ≤ π(v) ≤ β. (8)

By ordering the intervals contained in a set, reachability

queries can now be answered efﬁciently in O(log n) time on

DAGs. The resulting index (collection of reachable interval

sets) can be regarded as a materialization of the transitive clo-

sure of the graph, rendering this approach potentially infeasible

for large graphs, both in terms of space consumption as well

as computational complexity.

III. APPROXIMATE INTERVALS

For massive problem instances, indexing approaches that

materialize the transitive closure (or compute a compressed

variant without an a priori size restriction), suffer from limited

applicability. For this reason, recent work on reachability query

processing over massive graphs includes a shift towards guided

online search procedures. In this setting, every node is assigned

a concise label which – in contrast to the interval sets described

in Section II-A – is restricted by a predeﬁned size constraint.

These labels in general do not allow answering the query after

inspection of just the source node, yet can be used to prune

portions of the graph in an online search.

As a basic example, consider a reachability index that labels

every node v ∈ V with its topological order number τ(v).

While this simple variant of node labeling is obviously not

sufﬁcient to answer a reachability query by means of a single-

lookup, a graph search procedure can greatly beneﬁt from the

node labels: For a given query (s, t), the online search rooted

at s can terminate the expansion of a branch of the graph

whenever for the currently considered node v it holds

τ(v) ≥ τ(t). (9)

This follows from the properties of a topological ordering.

The recently proposed GRAIL reachability index [22], [23]

further extends this idea by labeling the vertices with approx-

imate intervals:

Suppose that for every node v we replace the set I(v) by

a single interval

(v) :=



min

w∈R(v)

π(w), max

w∈R(v)

π(w)



, (10)

spanning from the lowest to the highest reachable id. This

interval is approximate in the sense that all reachable ids are

covered whereas false positive entries are possible:

Deﬁnition 1 (False Positive). Let v ∈ V denote a node with

the approximate interval I

(v) = [α, β]. A vertex w ∈ V is

called false positive with respect to I

(v) if

α ≤ π(w) ≤ β and v 6∼ w. (11)

Obviously, the single interval I

(v) is not sufﬁcient to

establish a deﬁnite answer to a reachability query of the form

(G, v, w). However, all queries involving a target id π(w) that

lies outside the interval, i. e.

π(w) < α or π(w) > β, (12)

can be answered instantly with a negative answer, similar to

the basic approach based on Equation (9). In the opposite case,

that is, α ≤ π(w) ≤ β, no deﬁnite answer to the reachability

query can be given and the online search procedure continues

with an expansion of the child vertices, terminating as soon

剩余11页未读，继续阅读

weixin_38683848

粉丝: 4

FERRARI:灵活高效的图可达范围分配索引

Quartz任务管理实战：ferrari-master的探索

法拉利主题插件-Ferrari Cars Themes-crx介绍

Ferrari 275 Tab-crx插件：为标签页带来赛车激情

法拉利 - 超级汽车主题高清壁纸「Ferrari - Super Cars Theme HD Wallpapers」-crx插件

ferrari-master.rar

Lambo-Ferrari-Jaguar Super Cars HD Wallpapers-crx插件

Lambo-Ferrari-Bugatti Wallpaper HD Car Theme-crx插件

COVID-19-Apache-Beam-Statistics：使用Apache Beam for Python在Google Cloud Dataflow中对COVID-19数据进行统计处理。 恩佐·法拉利工业部“信息科学技术学会”考试项目（2019-20）

Ferrari Wallpapers and New Tab-crx插件

Ferrari Searches-crx插件

最新资源

COVID-19-Apache-Beam-Statistics：使用Apache Beam for Python在Google Cloud Dataflow中对COVID-19数据进行统计处理。恩佐·法拉利工业部“信息科学技术学会”考试项目（2019-20）