Graph-based SPARQL query engine
Table 1 Frequently used notations
Notation Description
G A RDF graph (Definition 1)
Q A SPARQL query graph (Definition 2)
v Avertexv in a SPARQL query graph (Definition 3)
u A vertex in RDF graph G (Definition 3)
eSig(e) An edge Signature (Definition 5)
vSig(u) A vertex signature (Definition 6)
RS The answer set of the SPARQL query matches
CL The candidate set of the SPARQL query matches
G
∗
A data signature graph (Definition 7)
Q
∗
A query signature graph
A(u) A transaction of vertex u (Definition 16)
O A node in T-index (Definition 16)
O.L The corresponding vertex list of node O
MS(O) The corresponding aggregate set of node O
3. E ={
−−−→
u
1
, u
2
} is a collection of directed edges that con-
nect the corresponding subjects and objects.
4. L
E
is a collection of edge labels. Given an edge e ∈ E,
its edge label is its corresponding property.
An edge
−−−→
u
1
, u
2
is an attribute property edge if u
2
∈ V
l
;
otherwise, it is a link edge.
Figure 1b shows an example of an RDF graph. The vertices
that are denoted by boxes are entity or class vertices, and
the others are literal vertices. A SPARQL query Q is also a
collection of triples. Some triples in Q have parameters.In
Q
2
(in Sect. 1), “?m” and “?bd” are parameters, and “?bd”
has a wildcard filter: FILTER(regx(str(?bd),“1976”)). Figure
2 shows the query graph that corresponds to Q
2
.
Definition 2 A query graph is a five-tuple Q =V
Q
, L
Q
V
,
E
Q
, L
Q
E
, FL, where
1. V
Q
= V
Q
c
∪ V
Q
e
∪ V
Q
l
∪ V
Q
p
is a collection of vertices
that correspond to all subjects and objects in a SPARQL
query, where V
Q
p
is a collection of parameter vertices,
and V
Q
c
and V
Q
e
and V
Q
l
are collections of class vertices,
entity vertices, and literal vertices in the query graph Q,
respectively.
2. L
Q
V
is a collection of vertex labels in Q. The label of
avertexv ∈ V
Q
p
is φ, that of a vertex v ∈ V
Q
l
is its
literal value and that of a vertex v ∈ V
Q
c
∪ V
Q
e
is its
corresponding URI.
3. E
Q
is a collection of edges that correspond to properties
in a SPARQL query. L
Q
E
is the edge labels in E
Q
.An
edge label can be a property or an edge parameter.
4. FL are constraint filters, such as a wildcard constraint.
Fig. 4 Three components in aggregate queries
Note that, in this paper, we do not consider SPARQL
queries that involve type reasoning/inferencing. Thus, the
match of a query is defined as follows.
Definition 3 Consider an RDF graph G and a query graph
Q that has n vertices {v
1
,...,v
n
}.Asetofn distinct vertices
{u
1
,...,u
n
} in G is said to be a match of Q, if and only if
the following conditions hold:
1. If v
i
is a literal vertex, v
i
and u
i
have the same literal
value;
2. If v
i
is an entity or class vertex, v
i
and u
i
have the same
URI;
3. If v
i
is a parameter vertex, u
i
should satisfy the filter con-
straint over parameter vertex v
i
if any; otherwise, there
is no constraint over u
i
;
4. If there is an edge from v
i
to v
j
in Q, there is also an
edge from u
i
to u
j
in G. If the edge label in Q is p
(i.e., property), the edge from u
i
to u
j
in G has the same
label. If the edge label in Q is a parameter, the edge
label should satisfy the corresponding filter constraint;
otherwise, there is no constraint over the edge label from
u
i
to u
j
in G.
Given Q
2
’s query graph in Fig. 2, vertices (005,006,
020,023,024) in RDF graph G of Fig. 1bformamatchof
Q
2
. Answering a SPARQL query is equivalent to finding all
matches of its corresponding query graph in RDF graph.
Definition 4 An aggregate SPARQL query Q consists of
three components:
1. Query pattern P
Q
is a set of triple statements that form
one query graph.
2. Group-by dimensions and measure dimensions are pre-
defined object variables in query pattern P
Q
.
3. (Optional) HAVING condition specifies the condition(s)
that each result group must satisfy.
Figure 4 demonstrates these three components. This
example shows the case where all group-by dimensions cor-
respond to attribute property edges (e.g., “?g”,“?t” and “?y”
in Fig. 4). This is not necessary, and in Sect. 9.3, we discuss
more general cases.
123