318 Front. Comput. Sci., 2012, 6(3): 313–338
from the traditional notion of graph pattern matching defined
in terms of subgraph isomorphism [12] and graph simulation
[15].
Example 2.3 The query Q
2
given in Fig. 1 is a PQ.InQ
2
each node carries search conditions, and each edge has an
associated regular expression, as shown in Fig. 1.
When the query Q
2
is posed on the data graph G of Fig. 1,
the query result Q
2
(G) is depicted in Fig. 2 and is shown in
the table below:
edge matches edge matches
(B, C) {(B
1
, C
3
), (B
2
, C
3
)} (C, C) {(C
3
, C
3
)}
(B, D) {(B
1
, D
1
), (B
2
, D
1
)} (C, D) {(C
3
, D
1
)}
(C, B) {(C
3
, B
1
), (C
3
, B
2
)}
Indeed, one can verify that B
i
∼ B (i ∈ [1, 2]), C
j
∼ C ( j ∈
[1, 3]) and D
1
∼ D. In addition, the edge from C to D (labeled
with fa
2
sa
2
)inQ
2
is mapped to a path C
3
fa
−→ C
1
sa
−−→ D
1
in
G; similarly for other edges in Q
2
.
Observe that the node pair (C
1
, B
1
)inG is not a match of
the edge (C, B)inQ
2
, since there exists no path in G from C
1
to B
1
that satisfies fn. In light of a similar reason, (C
1
, D
1
)in
G is not a match of the edge (C, D)inQ
2
, although there ex-
ists a path C
1
fa
−→C
2
fa
−→C
1
sa
−−→ D
1
in G that satisfies fa
2
sa
2
.
Remark.(1)RQs are a special case of PQs, which consist of
two nodes and a single edge.
(2) Bounded simulation [34] is a special case of PQs,by
only allowing patterns in which (a) there is only a single sym-
bol c in Σ, i.e., only a single edge type is allowed, and (b) all
edges are labeled with either c
k
or c
+
,wherek is a positive
integer.
One can readily verify the following, which confirms that
the semantics of PQs is well defined.
Proposition 2.1 For any data graph G and any graph pattern
query Q
p
, there is a unique result Q
p
(G).
Proof (i) We first show that there exists a query result. We
consider all possible sets of {(e, S
e
) | S
e
is a set of node pairs
in G for each edge e in Q
p
}, which satisfy conditions (1) and
(2) of the semantics of PQs. Note that those sets are not nec-
essarily maximum, and the number of such possible sets is
finite.
We define the query result to be a set with the maximum
size, which, as will be seen shortly, is unique. If there exists
an edge e such that S
e
= ∅ in the set, the query result is ∅ by
condition (3) of the semantics of PQs.
(ii) We then show the uniquenessby contradiction. Assume
that there exist two distinct maximum query results Q
1
p
(G)
and Q
2
p
(G). We then show that there exists a result larger than
both Q
1
p
(G)andQ
2
p
(G). Given two such sets S
1
= {(e, S
1
e
) | e
is an edge in Q
p
} and S
2
= {(e, S
2
e
) | e is an edge in Q
p
},
we define the union of S
1
and S
2
as {(e, S
1
e
∪ S
2
e
) | e is an
edge in Q
p
}, denoted by S
1
∪ S
2
. Observe that Q
i
p
is possibly
empty when S
i
e
is empty for some e,wherei ∈ [1, 2]. Let
Q
p
(G) = Q
1
p
(G) ∪ Q
2
p
(G). By the definition of PQs, one can
readily verify that Q
p
(G) is a query result larger than both
Q
1
p
(G)andQ
2
p
(G). This contradicts the assumption that both
Q
1
p
(G)andQ
2
p
(G) are maximum.
By (i) and (ii) above, we have the conclusion.
3 Fundamental graph queries problems
We next investigate containment, equivalence, and minimiza-
tion of graph queries. As remarked earlier, these problems are
important for any query language [36]. We focus on graph
pattern queries (PQs), but state the relevant results for reach-
ability queries (RQs).
3.1 Containment and equivalence
We first study containment and equivalence of PQs.
Containment Given two PQs Q
1
= (V
1
p
, E
1
p
, f
1
v
, f
1
e
)and
Q
2
= (V
2
p
, E
2
p
, f
2
v
, f
2
e
), we say that Q
1
is contained in Q
2
,
denoted by Q
1
Q
2
, if there exists a mapping λ from E
1
p
to E
2
p
such that for any data graph G and any edge e in Q
1
,
S
e
⊆ S
λ(e)
,where(e, S
e
) ∈ Q
1
(G), (λ(e), S
λ(e)
) ∈ Q
2
(G), and
Q
1
(G), Q
2
(G) are the query results of Q
1
, Q
2
on G, respec-
tively.
Intuitively, the mapping λ serves as a renaming function
such that Q
1
(G) is mapped to Q
2
(G) after the renaming. For
an edge e = (u
1
, u
2
)inQ
1
,letλ(e) = (w
1
, w
2
). Then Q
1
Q
2
as long as for any data graph G and any node v in G,(1)if
v ∼ u
1
,thenv ∼ w
1
, denoted as u
1
w
1
;and(2)u
2
w
2
.
Moreover, (3) L( f
e
) ⊆ L( f
λ
(e)), denoted as e |= λ(e).
Example 3.1 Consider three PQs giveninFig.3,inwhich
all B
i
’s (i ∈ [1, 3]) carry the same predicates; similarly for all
C
j
’s ( j ∈ [1, 6]). Denote by λ
i, j
a mapping from Q
i
to Q
j
.
(1) Q
2
Q
1
: there exists a mapping λ
2,1
,whereλ
2,1
(B
2
,
C
4
) = (B
1
, C
1
). Note that the mapping is not unique,
e.g., both λ
2,1
(B
2
, C
4
) = (B
1
, C
2
)andλ
2,1
(B
2
, C
4
) =
(B
1
, C
3
) are valid mappings.
(2) Q
2
Q
3
, by letting λ
2,3
(B
2
, C
4
) = (B
3
, C
5
).