146 M. Rodríguez-Muro, M. Rezk / Web Semantics: Science, Services and Agents on the World Wide Web 33 (2015) 141–169
the two relations must have the same set of attributes. The result
includes all tuples that are in r
1
but not in r
2
.
r
1
\ r
2
= {t | t ∈ r
1
and t ∈ r
2
}.
Selection (σ ): This operator is used to choose a subset of the tuples
(rows) from a relation that satisfies a selection condition, acting as
a filter to retain only tuples that fulfills a qualifying requirement.
σ
p
(r) = {t | t ∈ r and p(t)}.
Rename (ρ): This is a unary operation written as, ρ
c
1
/c
2
(r), where
the result is identical to r except that the c
1
attribute in all tuples
is renamed to a c
2
attribute.
Projection (Π): This operator is used to reorder, select and filter
out attributes from a table.
Π
c
1
...c
k
(r) = {v
1
. . . v
k
| v
k
. . . v
n
∈ r}
In order to ease the presentation, we will often mimic SQL and
include the renaming in the projection using AS statements. Thus,
we write Π
c
1
AS c
2
(r) to denote ρ
c
1
/c
2
(r). We will also overload
the projection with statements of the form Π
constant AS c
2
(r) where
constant is null, or an string, or a concatenation of an string and
an attribute. Observe that this second operation can be easily
encoded in relational algebra using auxiliary tables. For instance,
Π
constant AS c
2
(r), can be encoded as ρ
nullttr/c
2
Π
attr(r )\ c
2
r × NullTable
where NullTable is a table with a single attribute nullttr and a single
null record.
Natural join (on): This is a binary operator written as, r
1
on r
2
, where
the result is the set of all combinations of tuples in r
1
and r
2
that
are equal on their common attribute names.
r
1
on
jn
r
2
= Π
sc
(σ
jn
(r
1
× r
2
)).
Left join ( ): This is a binary operator written as, r
1
r
2
, where
the result is the set of all combinations of tuples in R and S that
are equal on their common attribute names, in addition (loosely
speaking) to tuples in r
1
that have no matching tuples in r
2
.
r
1 jn
r
2
= (r
1
on
jn
r
2
)
∪((r
1
\ Π
col(r
1
)
(r
1
on
jn
r
2
)) × NullTable
attr(r
2
)\attr(r
1
)
)
where NullTable
attr(r
2
)\attr(r
1
)
is a table with a attributes attr(r
2
) \
attr(r
1
) and a single record consisting only on null values.
Recall that every relational algebra expression is equivalent to
a SQL query. Further details can be found in [28].
3.3. SPARQL
For formal purposes we will use the algebraic syntax of SPARQL
similar to the ones in [10,11] and defined in the standard.
6
How-
ever, to ease the understanding, we will often use graph patterns
(the usual SPARQL syntax) in the examples. It is worth noticing,
that although in this paper we restrict ourselves to SELECT queries,
in -ontop- we also allow ASK, DESCRIBE and CONSTRUCT queries,
which can be reduced or implemented using SELECT queries.
The SPARQL language that we consider contains the following
pairwise disjoint countably infinite sets of symbols: I, denoting
the IRIs, B, denoting blank nodes, L, denoting RDF literals; and V,
denoting variables.
The SPARQL algebra is constituted by the following graph
pattern operators (written using prefix notation): BGP (basic graph
pattern), Join, LeftJoin, Filter, and Union. A basic graph pattern is a
statement of the form:
BGP(s, p, o)
6
http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra.
where s ∈ I ∪ B ∪ V, p ∈ I ∪ V, and o ∈ I ∪ B ∪ L ∪ V. In the
standard, a BGP can contain several triples, but since we include
here the join operator, it suffices to view BGPs as the result of ◃▹
of its constituent triple patterns. Observe that the only difference
between blank nodes and variables in BGPs, is that the former do
not occur in solutions. So, to ease the presentation, we assume that
BGPs contain no blank nodes. The remaining algebra operators are:
• Join(pattern, pattern)
• LeftJoin(pattern, pattern, expression)
• Union(pattern, pattern)
• Filter(pattern, expression)
and can be nested freely. Each of these operators returns the result
of the sub-query it describes. Details on how to translate SPARQL
queries into SPARQL algebra can be found in the W3C specification,
and, in addition, several examples will be presented along the
paper.
Note. Converting Graph Patterns. It is critical to notice that
graph patterns are not translated straightforwardly into algebra
expressions. There is a pre-processing of the graph patterns where
filter expressions are either moved to the top of graph, or absorbed
by LeftJoin expressions. Details can be found in the SPARQL 1.0
specification.
7
A SPARQL query is a graph pattern P with a solution modifier,
which specifies the answer variables, that is, the variables in P
whose values should be in the output. In this work we ignore this
solution modifiers for simplicity.
Definition 16 (SPARQL Query). Let P be a SPARQL algebra expres-
sion, V a set of variables occurring in P, and G a set of RDF triples.
Then a query is a triple of the form (V , P, G).
We will often omit specifying V and G when they are not relevant
to the problem at hand.
Example 2. Consider the following SPARQL query Q :
This query is then translated into an SPARQL algebra expression
that has the following tree shape:
where T
1
, T
2
and T
3
represent (x,
′
knows
′
, y), (x,
′
email
′
, z), and
(x,
′
site
′
, w) respectively.
Semantics. Now we briefly introduce the formal set semantics of
SPARQL as specified in [10] with the difference that we updated
the definition of the LeftJoin to match the published standard
specifications. The result is a semantic which is more strict as the
one in [9] and the standard W3C semantics in the sense that:
1. We do not allow joins through null values.
2. We work with set semantics opposed to bag semantics.
7
http://www.w3.org/TR/rdf-sparql-query#convertGraphPattern.