图中类型边的正则表达式查询与可达性分析

108 浏览量更新于2024-07-15 收藏 1.2MB PDF 举报

本文主要探讨了在图形数据结构中引入正则表达式进行复杂查询的问题，特别是针对边类型多样化的图。作者们在2012年的《计算机科学》(Frontiers in Computer Science)期刊上发表了一篇题为"Adding regular expressions to graph reachability and pattern queries"的研究论文，该文章的DOI为10.1007/s11704-012-1312-y。首先，他们提出了一个新颖的概念——在图的可达性查询（graph reachability queries）中，使用正则表达式来描述不同类型的边。这种查询方式允许用户指定一个模式，通过匹配各种关系类型的边，来检查数据图中的路径和连接性。例如，如果在一个社交网络图中，用户可能想要查找具有特定联系规则（如“朋友的朋友”或“同事的第二层联系”）的人际关系，正则表达式就能有效地实现这样的复杂搜索。其次，文章还讨论了如何定义基于修订版图模拟（revised notion of graph simulation）的图模式匹配。传统的图模拟关注节点和边的一对一映射，而在这个新的框架下，图模拟扩展到了正则表达式的匹配，使得模式可以更加灵活地适应不同类型和数量的边。这使得图模式匹配不仅限于寻找精确的子图，而是可以处理更广泛的图结构相似度问题。在具体的应用场景中，作者通过展示在社交网络等新兴领域中，这些带有正则表达式的查询方法能够发现更有意义的关联和模式。比如，它们可以帮助研究人员分析用户的社交网络结构，发现潜在的影响力中心、社区结构，或者预测用户的行为和偏好。这篇论文为处理包含多种关系类型的图提供了强大的查询工具，推动了图数据分析技术的发展，尤其是在复杂网络分析、推荐系统以及信息安全等领域。通过将正则表达式融入图理论，它不仅提高了查询效率，也拓宽了图算法在实际问题中的适用范围。

Wenfei FAN et al. Adding regular expressions to graph reachability and pattern queries 317

We say that a node v in Gmatchesthe node u

in G

,de-

noted as v ∼ u

, if for each atomic formula “A op a”in f

there exists an attribute A in f

(v) such that v.A op a; simi-

larly for v ∼ u

. Intuitively, the predicates f

and f

specify

search conditions for query nodes.

We say that a pair (v

, v

) of nodes in Gmatchesthe reg-

ular expression f

, denoted as (v

, v

) ≈ f

, if there exists a

nonempty path ρ = v

−→ v



−→ v



···v



n−1

−→ v

in G such

that the string f

) ··· f

)isinL( f

The result Q

(G)ofQ

on G is the set of node pairs (v

, v

)

such that v

∼ u

, v

∼ u

,and(v

, v

) ≈ f

Intuitively, (v

, v

)isinQ

(G)ifv

and v

satisfy the condi-

tions speciﬁed by u

and u

, respectively, and moreover, there

exists a nonempty path from v

to v

in G such that the edge

colors on the path match the pattern speciﬁed by the regular

expression f

.Wesayv

(respectively v

)isamatch of u

(respectively u

Example 2.2 The query Q

shown in Fig. 1 is an RQ in

which f

= fa

2

fn, the nodeC has the predicate sp=“cloning”

and job = “biologist”, and the node B has the predicate job

= “doctor”.

When Q

is posed on the data graph G shown in Fig. 1 and

described in Example 2.1, the answer Q

(G) is shown in Fig.

2. Indeed, B

∼ B (i ∈ [1, 2]) and C

∼ C ( j ∈ [1, 3]). In addi-

tion, (C

, B

) ≈ f

since there exists a path C

−→ C

−→ B

G, and the string fa fn matches the regular expression fa

2

fn.

Similarly, (C

, B

) ≈ f

,(C

, B

) ≈ f

,and(C

, B

) ≈ f

Hence the query result Q

(G) = {(C

, B

), (C

, B

), (C

, B

)}. 

Remark. (1) Observe that a single edge in query Q

is mapped

to a nonempty path in the data graph G; moreover, the edge

colors on the path have to match the regular expression f

.(2)

RQs are more expressive than traditional reachability queries

studied in e.g., [2,6,30], by capturing edge relationships with

regular expressions.

Graph pattern queries Using RQs as building blocks, we

Fig. 2 Results of the queries Q

and Q

on G

next deﬁne graph pattern queries.

A graph pattern query (PQ) is a directed graph Q

, E

, f

), where (1) V

is a ﬁnite set of nodes; (2)

⊆ V

× V

is a ﬁnite set of edges, in which (u, u



)de-

notes an edge from node u to u



; and (3) the functions f

and

are deﬁned on V

and E

, respectively, such that for each

edge e = (u, u



) ∈ E

, Q

= (u, u



, f

(u), f



), f

)isanRQ.

In the rest part of this paper, we shall simply use f

to repre-

sent the regular expression assigned by the function f

to an

edge e unless speciﬁed otherwise.

Semantics. When the graph pattern query Q

is evaluated on

adatagraphG = (V, E, f

, f

), the query result Q

(G)isthe

maximum set {(e, S

) | e ∈ E

} that satisﬁes the following

conditions:

(1) for all edges e = (u

, u

)inQ

, S

⊆ Q

(G), where

= (u

, u

, f

), f

)isanRQ;

(2) for each edge e = (u

, u

)inQ

,ifapair(v

, v

)of

nodes in G is in S

, then (a) for each edge e

= (u

, u

)

in Q

, there exists a node v

in G such that (v

, v

) ∈

; and (b) for each edge e

= (u

, u

)inQ

,there

exists a node v

in G such that (v

, v

) ∈ S

;and

(3) there exists no edge e in Q

such that S

is empty. In

other words, Q

(G) = ∅ if for some e in Q

, S

is empty.

We say v

(respectively v

)isamatch of u

(respectively

). Here the size of Q

(G)isdeﬁned as



e∈E

|,where

| is the number of elements in S

Intuitively, Q

(G)deﬁnes a relation R ⊆ V

× V.Tosee

this, for each edge e = (u

, u

)inQ

, denote by Q

, u

, f

), f

) its associated RQ embedded in G

Then for a node u

∈ V

and a node v

∈ V,(u

, v

)isinR

if for each edge e = (u

, u

) emanating from u

in G

,there

exists a nonempty path ρ from v

to v

in G such that (1) the

node v

satisﬁes the search conditions speciﬁed by f

)in

the RQ Q

;(2)thepathρ is constrained by the regular ex-

pression f

;and(3)(u

, v

)isalsoinR. In addition, R covers

all the nodes in V

and moreover, it is maximum, i.e., for

all such relation R



, R



⊆ R.TheresultQ

(G)issimplyR

grouped by edges in E

. In particular, if condition (3) above

is not satisﬁed, Q

(G)isempty.

From this one can see that PQs are deﬁned in terms of an

extension of graph simulation [15], by (a) imposing search

conditions on the contents of nodes; (b) mapping an edge in a

pattern to a nonempty path in a data graph (i.e., the child u

is mapped to a descendant of v

of v

); and (c) constraining

the edges on the path with a regular expression. This diﬀers

318 Front. Comput. Sci., 2012, 6(3): 313–338

from the traditional notion of graph pattern matching deﬁned

in terms of subgraph isomorphism [12] and graph simulation

[15].

Example 2.3 The query Q

given in Fig. 1 is a PQ.InQ

each node carries search conditions, and each edge has an

associated regular expression, as shown in Fig. 1.

When the query Q

is posed on the data graph G of Fig. 1,

the query result Q

(G) is depicted in Fig. 2 and is shown in

the table below:

edge matches edge matches

(B, C) {(B

, C

), (B

, C

)} (C, C) {(C

, C

)}

(B, D) {(B

, D

), (B

, D

)} (C, D) {(C

, D

)}

(C, B) {(C

, B

), (C

, B

)}

Indeed, one can verify that B

∼ B (i ∈ [1, 2]), C

∼ C ( j ∈

[1, 3]) and D

∼ D. In addition, the edge from C to D (labeled

with fa

2

)inQ

is mapped to a path C

−→ C

−−→ D

G; similarly for other edges in Q

Observe that the node pair (C

, B

)inG is not a match of

the edge (C, B)inQ

, since there exists no path in G from C

to B

that satisﬁes fn. In light of a similar reason, (C

, D

)in

G is not a match of the edge (C, D)inQ

, although there ex-

ists a path C

−→C

−−→ D

in G that satisﬁes fa

2

.

Remark.(1)RQs are a special case of PQs, which consist of

two nodes and a single edge.

(2) Bounded simulation [34] is a special case of PQs,by

only allowing patterns in which (a) there is only a single sym-

bol c in Σ, i.e., only a single edge type is allowed, and (b) all

edges are labeled with either c

k

or c

,wherek is a positive

integer.

One can readily verify the following, which conﬁrms that

the semantics of PQs is well deﬁned.

Proposition 2.1 For any data graph G and any graph pattern

query Q

, there is a unique result Q

(G).

Proof (i) We ﬁrst show that there exists a query result. We

consider all possible sets of {(e, S

) | S

is a set of node pairs

in G for each edge e in Q

}, which satisfy conditions (1) and

(2) of the semantics of PQs. Note that those sets are not nec-

essarily maximum, and the number of such possible sets is

ﬁnite.

We deﬁne the query result to be a set with the maximum

size, which, as will be seen shortly, is unique. If there exists

an edge e such that S

= ∅ in the set, the query result is ∅ by

condition (3) of the semantics of PQs.

(ii) We then show the uniquenessby contradiction. Assume

that there exist two distinct maximum query results Q

(G)

and Q

(G). We then show that there exists a result larger than

both Q

(G)andQ

(G). Given two such sets S

= {(e, S

) | e

is an edge in Q

} and S

= {(e, S

) | e is an edge in Q

we deﬁne the union of S

and S

as {(e, S

∪ S

) | e is an

edge in Q

}, denoted by S

∪ S

. Observe that Q

is possibly

empty when S

is empty for some e,wherei ∈ [1, 2]. Let

(G) = Q

(G) ∪ Q

(G). By the deﬁnition of PQs, one can

readily verify that Q

(G) is a query result larger than both

(G)andQ

(G). This contradicts the assumption that both

(G)andQ

(G) are maximum.

By (i) and (ii) above, we have the conclusion. 

3 Fundamental graph queries problems

We next investigate containment, equivalence, and minimiza-

tion of graph queries. As remarked earlier, these problems are

important for any query language [36]. We focus on graph

pattern queries (PQs), but state the relevant results for reach-

ability queries (RQs).

3.1 Containment and equivalence

We ﬁrst study containment and equivalence of PQs.

Containment Given two PQs Q

= (V

, E

, f

)and

= (V

, E

, f

), we say that Q

is contained in Q

denoted by Q

 Q

, if there exists a mapping λ from E

to E

such that for any data graph G and any edge e in Q

⊆ S

λ(e)

,where(e, S

) ∈ Q

(G), (λ(e), S

λ(e)

) ∈ Q

(G), and

(G), Q

(G) are the query results of Q

, Q

on G, respec-

tively.

Intuitively, the mapping λ serves as a renaming function

such that Q

(G) is mapped to Q

(G) after the renaming. For

an edge e = (u

, u

)inQ

,letλ(e) = (w

, w

). Then Q

 Q

as long as for any data graph G and any node v in G,(1)if

v ∼ u

,thenv ∼ w

, denoted as u

 w

;and(2)u

 w

Moreover, (3) L( f

) ⊆ L( f

(e)), denoted as e |= λ(e).

Example 3.1 Consider three PQs giveninFig.3,inwhich

all B

’s (i ∈ [1, 3]) carry the same predicates; similarly for all

’s ( j ∈ [1, 6]). Denote by λ

i, j

a mapping from Q

to Q

(1) Q

 Q

: there exists a mapping λ

2,1

,whereλ

2,1

) = (B

, C

). Note that the mapping is not unique,

e.g., both λ

2,1

, C

) = (B

, C

)andλ

2,1

, C

) =

, C

) are valid mappings.

(2) Q

 Q

, by letting λ

2,3

, C

) = (B

, C

剩余25页未读，继续阅读

weixin_38660813

粉丝: 5
资源: 982

图中类型边的正则表达式查询与可达性分析

Adding SecurID protection to exe files-开源

Adding Password Management to Your Templates（cloudstack）

Adding Lock Elision to Linux - Slides (2012)-计算机科学

信息安全_数据安全_Adding Social Intelligence to Smart Devices.pdf

Give your apps a unique look by adding a logo to your menu(4

安装Mysql-python报错Adding Python Information to the Windows Registry使用register.py

MATLAB's regexpi Function: Finding Patterns with Regular Expressions, Rapidly Locating Target Text

Compare adjacency matrix and adjacency list for graph representation. How you choose which presentation to use depending on task and graph?

Cycle pruning: Check whether a node is in the path already before adding the path to Frontier. Multiple-path Pruning: Maintain an explored set containing all nodes that have been expanded. Check if a node is in the explored set before adding the path to Frontier.这两个有什么区别

最新资源