大规模图中标签顺序约束可达性查询：BitPath算法

27 浏览量更新于2024-07-14 收藏 296KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"BitPath - Label Order Constrained Reachability Queries over Large Graphs - 1203.2886 (13th March, 2012) - 计算机科学" 这篇论文主要探讨了在大型图结构中，一种特殊约束下的可达性查询问题，即"标签顺序约束的可达性查询"。该问题源于如RDF（Resource Description Framework）这样的边缘标记图，其中每个边都有特定的标签。给定一个源节点x，一个目标节点y，以及一个由多个标签(a, b, c, d)组成的序列，问题在于是否存在一条路径，使得这条路径上的边标签满足一个正则表达式"*a.*b.*c.*d.*"。正则表达式的含义如下： - "*a" 表示在边"a"之前可以有任意其他边标签出现。 - "a.*" 意味着至少有一条边的标签为"a"。 - ".*" 在"a"和"b"之间允许零个或多个边标签出现。论文提出的查询处理算法基于简单的分治策略和贪婪剪枝方法，目的是有效地限制搜索空间。然而，核心创新在于一种基于压缩位向量的图索引技术，这种方法使得能够对原本无法处理的大规模图进行索引。作者们——Medha Atre、Vineet Chaoji和Mohammed J. Zaki分别来自宾夕法尼亚大学、雅虎实验室和伦斯勒理工学院——通过实验评估了他们的方法，证明了其在处理大规模图数据时的有效性和效率。实验结果表明，这种基于位向量的索引方法对于提高查询性能和降低存储需求具有显著优势，特别是在处理那些传统方法难以处理的大型图数据集时。此外，论文还可能涵盖了以下关键点： - 图数据结构的理论与应用：深入理解如何在边缘带有标签的图中表示和操作数据。 - 正则表达式在图查询中的应用：如何用正则表达式来定义和表达复杂的路径约束。 - 索引技术：位向量压缩技术如何用于构建高效的图索引，以支持快速的可达性查询。 - 查询优化策略：分治和剪枝策略如何协同工作，以减少计算复杂性和内存需求。 - 性能分析与比较：可能包括与其他现有方法的性能比较，以及在不同规模数据上的表现。这篇研究对于理解并处理大规模图数据的可达性查询问题具有重要价值，对于开发高效的数据管理和查询系统，尤其是应用于知识图谱、社交网络分析、Web信息检索等领域，提供了新的思路和工具。

资源详情

资源推荐

of adding self-edges with same edge-labels is to keep track of the edg e labels that appear in a given

SCC. These edges help in determining paths going through an SCC without having to traverse the

entire SCC subgraph at query time.

:the_thirteenth_floor

:the_matrix

"1999"

:the_matrix_reloaded

:movie

:the_thirteenth_floor

"1999"

:the_matrix

:the_matrix_reloaded

:movie

Node N−SUCC−E

111111

001110

000000

N−PRED−E

110100

010000

000000

011000

010011

Edge Label

:releasedIn

:similar_to

rdf:type

EL−ID

100100

011000

000011

:similar_to

:releasedIn

rdf:type

Fig. 2. BitPath Indexes using Compressed Bit-vectors

A label-order-constrained-reachability (LOCR) query requires knowledge of the relative order

of edge labels occurr ing on a given path. As pointed out in Section 1 a nd 2 previously, it is

computationally infeasible to index paths of the order of 10

or more due to time and space

constraints. We solve the problem of LOCR queries by creating four types of indexes on a graph,

and using a query answering algorithm, based on a combination of greedy-pruning and divide-and-

conquer method. The four types of indexes are as follows:

1. EID (edge-to-ID): Each edge e ∈ E is mapped to a unique integer ID. For instance, for the

graph shown in Fig. 2, edge (:the

matrix, :movie) with label rdf:type is mapped to ID 5.

2. N-SUCC-E (node’s successor edges): For each node, we index IDs of all the successor edges,

i.e., edges that w ill get visited if we traverse the subgraph under the given node. The self edges

added to the graph a s a result of collapsing an SCC ca n be handled trivially by examining the

head and tail of the given edge. In Fig. 2, node :the

thirteenth ﬂoor will have edge IDs 1, 2, 3,

4, 5, 6 in its successor list.

3. N-PRED-E (node’s predecessor edges): Similarly, for each node we index the pr edecessor edges,

i.e., edges that will get vis ited if we make a backward traversal on the entire subgraph above

the given node. In Fig. 2, node :movie will have edge IDs 2, 5, 6 in its predecessor list.

4. EL-ID (edge label to edge ID): For each unique edge label l ∈ L, we index IDs of all the edges

in E which have edge label l. In Fig. 2, edge label rdf:type will have IDs 5 and 6 in its list.

In practice, we use bit-vectors of length |E| (total number of edges in the graph), for building

N-SUCC-E, N-PRED-E, and EL-ID indexes. Each bit position in the bit-vector corresponds to

the unique ID assigned to an edge as per the EID index. Fig. 2 shows the indexes for the given

example graph. We apply run-length-encoding on N-SUCC-E a nd N-PRED-E indexes of each node

depe nding on the compression ratio. Typically the unique edge labels in the graph are much fewer

compared to the number of nodes. Hence in an EL-ID index of an edge label, there are many

more interleaved 0s and 1s as c ompared to an N-SUCC-E or N-PRED-E index, which precludes

the beneﬁt of run-length-encoding. Hence we do not apply run-length encoding on the EL-ID

index. Note that at the time of querying we do not uncompress any compressed indexes. All the

algorithms are implemented to perform bitwise operations on both the gap-compres sed indexes as

well as non-compressed indexes.

3.1 BitPath Index Creation Procedure

We create EID, N-SUCC-E, and EL-ID indexes in one depth-ﬁrst-search (DFS) pass over the DAG

generated after collapsing the SCCs as outlined previously. N-PRED-E index is created by making

one backward DFS pass on the graph and using the EID mapping generated in the ﬁrst pass.

剩余16页未读，继续阅读

weixin_38703669

粉丝: 8
资源: 879

大规模图中标签顺序约束可达性查询：BitPath算法

Efficient processing of label-constraint reachability queries in large graphs

tamilselvi selvaraj (2023). matlab code for constrained nsga ii - dr.s.baska

system verilog 1800-2012

Backhaul-Constrained HetNets翻译

matplotlib.Figure有哪些参数

javaFx如何自定义列实现一些列的位置冻结

composite differential evolution for constrained evolutionary optimization.

UserWarning: This figure was using constrained_layout, but that is incompatible with subplots_adjust and/or tight_layout; disabling constrained_layout.

拟牛顿迭代法matlab程序

Contraining-GRU算法Python详解

system verilog

粒子群算法添加约束条件相关的论文有哪些？

empire.cpp:189:42: error: 'getColor' is a private member of 'Playable' empire.h:42:14: note: constrained by implicitly private inheritance here county.h:15:18: note: member is declared here

最新资源