the indexing and querying of the graph, to guide the integration of data graphs, and so forth. We
refer to the survey by Čebirić et al. [80] for further details.
3.2 Identity
In Figure 1, we use nodes like
Santiago
, but to which Santiago does this node refer? Do we refer to
Santiago de Chile, Santiago de Cuba, Santiago de Compostela, or do we perhaps refer to the indie
rock band Santiago? Based on edges such as
Santa Lucía
Santiago
city
, we may deduce that it is
one of the three cities mentioned (not the rock band), and based on the fact that the graph describes
tourist attractions in Chile, we may further deduce that it refers to Santiago de Chile. Without
further details, however, disambiguating nodes of this form may rely on heuristics prone to error
in more dicult cases. To help avoid such ambiguity, rst we may use globally unique identiers
to avoid naming clashes when the knowledge graph is extended with external data, and second we
may add external identity links to disambiguate a node with respect to an external source.
3.2.1 Global identifiers. Assume we wished to compare tourism in Chile and Cuba, and we have
acquired an appropriate knowledge graph for Cuba. Part of the benet of using graphs to model
data is that we can merge two graphs by taking their union. However, as shown in Figure 15, using
an ambiguous node like
Santiago
may result in a naming clash: the node is referring to two dierent
real-world cities in both graphs, where the merged graph indicates that Santiago is a city in both
Chile and Cuba (rather than two dierent cities).
8
A practical way to avoid such naming clashes
would be to use namespaces like
chile:
or
cuba:
in the corresponding graphs, such that nodes
like
chile:Santiago
and
cuba:Santiago
will not clash so long as distinct namespaces are used.
In the context of the Semantic Web, the RDF data model goes one step further and recommends
that global Web identiers be used for nodes and edge labels. However, rather than adopt the
Uniform Resource Locators (URLs) used to identify the location of information resources such as
webpages, RDF 1.1 proposes to use Internationalised Resource Identiers (IRIs) to identify non-
information resources such as cities or events.
9
Hence, for example, in the RDF representation of the
Wikidata [
514
] – a knowledge graph proposed to complement Wikipedia, discussed in more detail in
Section 10 – while the URL
https://www.wikidata.org/wiki/Q2887
refers to a webpage that can be loaded in a
browser providing human-readable meta-data about Santiago, the IRI
http://www.wikidata.org/entity/Q2887
refers to the city itself. Distinguishing the identiers for both resources (the webpage and the city
itself) avoids naming clashes; for example, if we use the URL to identify both the webpage and the
city, we may end up with an edge in our graph, such as (with readable labels below the edge):
http://www.wikidata.org/wiki/Q2887 https://www.wikidata.org/wiki/Q203534
http://www.wikidata.org/wiki/Property:P112
[Santiago (URL)]
[founded by (URL)]
[Pedro de Valdivia (URL)]
Such an edge leaves ambiguity: was Pedro de Valdivia the founder of the webpage, or the city?
Using IRIs for entities distinct from the URLs for the webpages that describe them avoids such
ambiguous cases, where Wikidata thus rather denes the previous edge as follows:
http://www.wikidata.org/entity/Q2887 http://www.wikidata.org/entity/Q203534
http://www.wikidata.org/prop/direct/P112
[Santiago (IRI)]
[founded by (IRI)]
[Pedro de Valdivia (IRI)]
using IRIs for the city, person, and founder of, distinct from the webpages describing them.
If HTTP IRIs are used to identify the graph’s entities, when the IRI is looked up (via HTTP),
the web-server can return (or redirect to) a description of that entity in formats such as RDF. This
8
Such a naming clash is not unique to graphs, but could also occur if merging tables, trees, etc.
9
Uniform Resource Identiers (URIs) can be Uniform Resource Locators (URLs), used to locate information resources, and
Uniform Resource Names (URNs), used to name non-information resources. Internationalised Resource Identiers (IRIs) are
URIs that allow Unicode. For example,
http://example.com/Ñam
is an IRI, but not a URI, due to the use of “Ñ”. Percentage
encoding – http://example.com/%C3%91am – can encode an IRI as a URI (but reduces readability).
16