AReviewofRelationalMachineLearningforKnowledgeGraph

Knowledge

Graph

需积分: 35 121 浏览量更新于2023-05-28 评论收藏 2.26MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

INVITED

PAPER

A Review of Relational Machine

Learning for Knowledge Graphs

This paper reviews how statistical models can be ‘‘trained’’ on large knowledge graphs

and then used to predict new facts about the world.

By Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich

ABSTRACT

Relational machine learning studies methods for

the s tatistical analysis of relational, or graph-structured, data.

In this paper, we provide a review of how such statistical

models can be ‘‘trained’’ on large knowledge graphs, and then

used to predict new facts about the world (which is equivalent

to predicting new edges in the graph). In particular, we discuss

two fundamentally different kinds of statistical relational

models, both of which can scale to massive dat a sets. The first

is based on latent feature models such as tensor factorization

and multiway neural networks. The second is based on mining

observable patterns in the graph. We also show how to

combine these latent and observable models to get improved

modeling power at decreased computational cost. Finally, we

discuss how such statistical models of graphs can be combined

with text-based information extraction methods for automat-

ically constructing knowledge graphs from the Web. To this

end, we also discuss Google’s knowledge vault project as an

example of such combination.

KEYWORDS

Graph-based mode ls; knowl edge ext raction;

knowledge graphs; latent feature models; statistical relational

learning

I. INTRODUCTION

‘‘I am convinced that the crux of the problem of

learning is recognizing relationships and being able

to use them’’VChristopher Strachey in a letter to Alan

Turing, 1954.

Traditional machine learning algorithms take as input a

feature vector, which represents an object in terms of

numeric or categorical attributes. The main learning task

is to learn a mapping from this feature vector to an

output prediction of some form. This could be class

labels, a regression score, or an unsupervised cluster id

or latent vector (embedding). In statistical relational

learning (SRL), the representation of an object can

contain its relationships to other objects. Thus the data is

in the form of a graph, consisting of nodes (entities) and

labeled edges (relationships between entities). The main

goals of SRL include prediction of missing edges,

prediction of properties of nodes, and clustering nodes

based on their connectivity patterns. These tasks arise in

many settings such as analysis of social networks and

biological pathways. For further information on SRL,

see [1]–[3].

In this paper, we review a variety of techniques from

the SRL community and explain how they can be applied to

large-scale knowledge graphs (KGs), i.e., graph structured

knowledge bases (KBs) that store factual information in

form of relationships between entities. Recently, a large

number of knowledge graphs have been created, including

YAGO [4], DBpedia [5], NELL [6], Freebase [7], and the

Google Knowledge Graph [8]. As we discuss in Section II,

these graphs contain millions of nodes and billions of

edges. This causes us to focus on scalable SRL techniques,

which take time that is (at most) linear in the size of

the graph.

We can apply SRL methods to existing KGs to learn a

model that can predict new facts (edges) given existing

facts. We can then combine this approach with informa-

tion extraction methods that extract ‘‘noisy’’ facts from the

Web (see, e.g., [9] and [10]). For example, suppose an

information extraction method returns a fact claiming that

Barack Obama was born in Kenya, and suppose (for

illustration purposes) that the true place of birth of Obama

was not already stored in the knowledge graph. An SRL

model can use related facts about Obama (such as his

Manuscript received April 8, 2015; revised August 14, 2015; accepted September 16,

2015. Date of current version December 18, 2015. The work of M. Nickel was supported by

the Center for Brains, Minds and Machines (CBMM) under NSF STC award CCF-1231216.

The work of V. Tresp was supported by the German Federal Ministry for Economic Affairs

and Energy under the ‘‘Smart Data’’ technology program (Grant 01MT14001).

M. Nickel is with the Laboratory for Computational and Statistical Learning (LCSL),

Massachusetts Institute of Technology, Cambridge, MA 02139 USA and also with the

Istituto Italiano di Tecnologia, 16163 Genova, Italy (e-mail: mnick@mit.edu).

K. Murphy and E. Gabrilovich are with Google Inc., Mountain View, CA 94043 USA.

V. Tresp is with Siemens AG, Corporate Technology, Munich 81739, Germany and also

with the Ludwig Maximilian University of Munich, Munich 80539, Germany.

Digital Object Identifier: 10.1109/JPROC.2015.2483592

0018-9219



2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Vol. 104, No. 1, January 2016 | Proceedings of the IEEE 11

profession being U.S. President) to infer that this new fact

is unlikely to be true and should be discarded. This

provides us a way to ‘‘grow’’ a KG automatically, as we

explaininmoredetailinSectionIX.

The remainder of this paper is structured as follows. In

Section II, we introduce knowledge graphs and some of

their properties. Section III discusses SRL and how it can

be applied to knowledge graphs. There are two main

classes of SRL techniques: those that capture the

correlation between the nodes/edges using latent vari-

ables, and those that capture the correlation directly using

statistical models based on the observable properties of the

graph.WediscussthesetwofamiliesinSectionsIVandV,

respectively. Section VI describesmethodsforcombining

these two approaches, in order to get the best of both

worlds. Section VII discusses how such models can be

trainedonKGs.InSectionVIIIwediscussrelational

learning using Markov Random Fields. In Section IX, we

describe how SRL can be used in automated knowledge

base construction projects. In Section X, we discuss

extensions of the presented methods, and Section XI

presents our conclusions.

II. KNOWLEDGE GRAPHS

In this section, we introduce knowledge graphs, and

discuss how they are represented, constructed, and used.

A. Knowledge Representation

Knowledge graphs model information in the form of

entities and relationships between them. This kind of

relational knowledge representation has a long history in

logic and artificial intelligence [11], for example, in semantic

networks [12] and frames [13]. More recently, it has been

used in the Semantic Web community with the purpose of

creating a ‘‘web of data’’ that is readable by machines [14].

While this vision of the Semantic Web remains to be fully

realized, parts of it have been achieved. In particular, the

concept of linked data [15], [16] has gained traction, as it

facilitates publishing and interlinking data on the Web in

relational form using the W3C Resource Description

Framework (RDF) [17], [18]. (For an introduction to

knowledge representation, see, e.g., [11], [19], and [20].)

In this paper, we will loosely follow the RDF standard

and represent facts in the form of binary relationships, in

particular (subject, predicate, object) (SPO) triples, where

subject and object are entities and predicate is the relation

between them. (We discuss how to represent higher arity

relations in Section X-A.) The existence of a particular

SPO triple indicates an existing fact, i.e., that the

respective entities are in a relationship of the given type.

For instance, the information

‘‘Leonard Nimoy was an actor who played the

character Spock in the science-fiction movie

Star Trek’’

can be expressed via the following set of SPO triples:

We can combine all the SPO triples together to form a

multigraph, where nodes represent entities (all subjects

and objects), and directed edges represent relationships.

The direction of an edge indicates whether entities occur

as subjects or objects, i.e., an edge points from the subject

to the object. Different relations are represented via

different types of edges (also called edge labels). This

construction is called a knowledge graph (KG), or

sometimes a heterogeneous information network [21].)

See Fig. 1 for an example.

In addition to being a collection of facts, knowledge

graphs often provide type hierarchies (Leonard Nimoy is

an actor, which is a person, which is a living thing) and

type constraints (e.g., a person can only marry another

person, not a thing).

B. Open Versus Closed World Assumption

While existing triples always encode known true

relationships (facts), there are different paradigms for

the interpretation of nonexisting triples.

• Under the closed world assumption (CWA),

nonexisting triples indicate false relationships.

For example, the fact that in Fig. 1 there is no

starredIn edge from Leonard Nimoy to Sta r Wars is

interpreted to mean that Nimoy definitely did not

star in this movie.

• Under the open world assumption (OWA), a

nonexisting triple is interpreted as unknown, i.e.,

the corresponding relationship can be either true

or false. Continuing with the above example, the

missing edge is not interpreted to mean that

Nimoy did not star in Star Wars.Thismore

cautious approach is justified, since KGs are known

Fig. 1. Sample knowledge graph. Nodes represent entities, edge

labels represent types of relations, and edges represent existing

relationships.

Nickel et al.: A Review of Relational Machine Learning for Knowledge Graphs

12 Proceedings of the IEEE |Vol.104,No.1,January2016

to be very incomplete. For example, sometimes

just the main actors in a movie are listed, not the

complete cast. As another example, note that even

the place of birth attribute, which you might think

would be typically known, is missing for 71% of all

people included in Freebase [22].

RDF and the Semantic Web make the open-world

assumption. In Section VII-B we also discuss the local

closed world assumption (LCWA), which is often used for

training relational models.

C. Knowledge Base Construction

Completeness, accuracy, and data quality are important

parameters that determine the usefulness of knowledge

bases and are influenced by the way knowledge bases are

constructed. We can classify KB construction methods into

four main groups:

• in curated approaches, triples are created manually

by a closed group of experts;

• in collaborative approaches, triples are created

manually by an open group of volunteers;

• in automated semistructured approaches, triples

are extracted automatically from semistructured

text (e.g., infoboxes in Wikipedia) via hand-crafted

rules, learned rules, or regular expressions;

• in automated unstructured approaches, triples are

extracted automatically from unstructured text via

machine learning and natural language processing

techniques (see, e.g., [9] for a review).

Construction of curated knowledge bases typically

leads to highly accurate results, but this technique does not

scale well due to its dependence on human experts.

Collaborative knowledge base construction, which was

used to build Wikipedia and Freebase, scales better but still

has some limitations. For instance, as mentioned previ-

ously, the place of birth attribute is missing for 71% of all

people included in Freebase, even though this is a

mandatory property of the schema [22]. Also, a recent

study [35] found that the growth of Wikipedia has been

slowing down. Consequently, automatic knowledge base

construction methods have been gaining more attention.

Such methods can be grouped into two main

approaches. The first approach exploits semistructured

data, such as Wikipedia infoboxes, which has led to large,

highly accurate knowledge graphs such as YAGO [4], [27]

and DBpedia [5]. The accuracy (trustworthiness) of facts in

such automatically created KGs is often still very high. For

instance, the accuracy of YAGO2 has been estimated

to be

over 95% through manual inspection of sample facts [36],

and the accuracy of Freebase [7] was estimated to be 99%.

However, semistructured text still covers only a small

fraction of the information stored on the Web, and

completeness (or coverage) is another important aspect of

KGs. Hence the second approach tries to ‘‘read the Web,’’

extracting facts from the natural language text of Web

pages. Example projects in this category include NELL [6]

and the knowledge vault [28]. In Section IX, we show how

we can reduce the level of ‘‘noise’’ in such automatically

extracted facts by using the knowledge from existing, high-

quality repositories.

KGs, and more generally KBs, can also be classified

based on whether they employ a fixed or open lexicon of

entities and relations. In particular, we distinguish two

main types of KBs.

• In schema-based approaches, entities, and rela-

tions are represented via globally unique identi-

fiers and all possible relations are predefined in a

fixed vocabulary. For example, Freebase might

represent the fact that Barack Obama was born in

Hawaii using the triple (/m/02mjmr,/people/

person/born-in,/m/03gh4), where /m/02mjmr is

the unique machine ID for Barack Obama.

• In schema-free approaches, entities and relations

are identified using open information extraction

(OpenIE) techniques [37], and represented via

normalized but not disambiguated strings (also

referred to as surface names). For example, an

OpenIE system may contain triples such as

(‘‘Obama,’’ ‘‘born in,’’ ‘‘Hawaii’’), (‘‘Barack Obama,’’

‘‘place of birth,’’ ‘‘Honolulu’’), etc. Note that it is not

clear from this representation whether the first

triple refers to the same person as the second triple,

nor whether ‘‘born in’’ means the same thing as

‘‘place of birth.’’ This is the main disadvantage of

OpenIE systems.

Table 1 lists current knowledge base construction

projects classified by their creation method and data

schema.Inthispaper,wewillonlyfocusonschema-based

KBs. Table 2 shows a selection of such KBs and their sizes.

D. Uses of Knowledge Graphs

Knowledge graphs provide semantically structured

information that is interpretable by computersVa

For detailed statistics, see http://www.mpi-inf.mpg.de/departments/

databases-and-information-systems/research/yago-naga/yago/statistics/.

http://thenoisychannel.com/2011/11/15/cikm-2011-industry-event-

john-giannandrea-on-freebase-a-rosetta-stone-for-entities

Table 1

Knowledge Base Construction Projects

Nickel et al.: A Review of Relational Machine Learning for Knowledge Graphs

Vol. 104, No. 1, January 2016 | Proceedings of the IEEE 13

property that is regarded as an important ingredient to

build more intelligent machines [38]. Consequently, knowl-

edge graphs are already powering multiple ‘‘Big Data’’

applications in a variety of commercial and scientific

domains. A prime example is the integration of Google’s

Knowledge Graph, which currently stores 18 billion facts

about 570 million entities, into the results of Google’s search

engine [8]. The Google Knowledge Graph is used to identify

and disambiguate entities in text, to enrich search results

with semantically structured summaries, and to provide

links to related entities in exploratory search. (Microsoft has

a similar KB, called Satori, integrated with its Bing search

engine [39].)

Enhancing search results with semantic information

from knowledge graphs can be seen as an important step to

transform text-based search engines into semantically

aware question answering services. Another prominent

example demonstrating the value of knowledge graphs is

IBM’s question answering system Watson, which was able

to beat human experts in the game of Jeopardy!.Among

others, this system used YAGO, DBpedia, and Freebase as

its sources of information [40]. Repositories of structured

knowledge are also an indispensable component of digital

assistants such as Siri, Cortana, or Google Now.

Knowledge graphs are also used in several specialized

domains. For instance, Bio2RDF [41], Neurocommons

[42], and LinkedLifeData [43] are knowledge graphs that

integrate multiple sources of biomedical information.

These have been used for question answering and decision

support in the life sciences.

E. Main Tasks in Knowledge Graph Construction and

Curation

In this section, we review a number of typical KG tasks.

Link prediction is concerned with predicting the existence

(or probability of correctness) of (typed) edges in the

graph (i.e., triples). This is important since existing

knowledge graphs are often missing many facts, and

some of the edges they contain are incorrect [44]. In the

context of knowledge graphs, link prediction is also

referred to as knowledge graph completion. For example,

in Fig. 1, suppose the characterIn edge from Obi-Wan

Kenobi to Star Wars were missing; we might be able to

predict this missing edge, based on the structural similarity

between this part of the graph and the part involving Spock

and Star Trek. It has been shown that relational models

that take the relationships of entities into account can

significantly outperform nonrelational machine learning

methods for this task (e.g., see [45] and [46]).

Entity resolution (also known as record linkage [47],

object identification [48], instance matching [49], and

deduplication [50]) is the problem of identifying which

objects in relational data refer to the same underlying

entities. See Fig. 2 for a small example. In a relational

setting, the decisions about which objects are assumed to

be identical can propagate through the graph, so that

matching decisions are made collectively for all objects in a

domain rather than independently for each object pair

(see, for example, [51]–[53]). In schema-based automated

knowledge base construction, entity resolution can be used

to match the extracted surface names to entities stored in

the knowledge graph.

Fig. 2. Example of entity resolution in a toy knowledge graph. In this example, nodes 1 and 3 refer to the identical entity, the actor Alec Guinness.

Node 2, on the other hand, refers to Arthur Guinness, the founder of the Guinness brewery. The surface name of node 2 (‘‘A. Guinness’’) alone

would not be sufficient to perform a correct matching as it could refer to both Alec Guinness and Arthur Guinness. However, since links in the

graph reveal the occupations of the persons, a relational approach can perform the correct matching.

Table 2 Size of Some Schema-Based Knowledge Bases

Nickel et al.: A Review of Relational Machine Learning for Knowledge Graphs

14 Proceedings of the IE EE |Vol.104,No.1,January2016

Link-based clustering extends feature-based clustering

to a relational learning setting and groups entities in

relational data based on their similarity. However, in link-

based clustering, entities are not only grouped by the

similarity of their features but also by the similarity of their

links. As in entity resolution, the similarity of entities can

propagate through the knowledge graph, such that

relational modeling can add important information for

this task. In social network analysis, link-based clustering

is also known as community detection [54].

III. STATISTICAL RELATIONAL

LEARNING FOR KNOWLEDGE GRAPHS

Statistical relational learning is concerned with the

creation of statistical models for relational data. In the

following sections we discuss how statistical relational

learning can be applied to knowledge graphs. We will

assume that all the entities and (types of) relations in a

knowledge graph are known. (We discuss extensions of

this assumption in Section X-C). However, triples are

assumed to be incomplete and noisy; entities and relation

types may contain duplicates.

Notation: Before proceeding, let us define our mathe-

matical notation. (Variable names will be introduced later

in the appropriate sections.) We denote scalars by lower

case letters, such as a; column vectors (of size N)bybold

lower case letters, such as a; matrices (of size N

 N

)by

bold upper case letters, such as A;andtensors(ofsize

 N

) by bold upper case letters with an

underscore, such as

A.Wedenotethek’th ‘‘frontal slice’’

of a tensor

A by A

(which is a matrix of size N

 N

and the ði; j; kÞth element by a

ijk

(which is a scalar). We use

½a; b to denote the vertical stacking of vectors a and b, i.e.,

½a; b¼



. We can convert a matrix A of size N

 N

into a vector a of size N

by stacking all columns of A,

denoted a ¼ vecðAÞ. The inner (scalar) product of two

vectors (both of size N)isdefinedbya

b ¼

i¼1

The tensor (Kronecker) product of two vectors (of size

and N

)isavectorofsizeN

with entries

a  b ¼

. Matrix multiplication is denoted by

AB as usual. We denote the L

norm of a vector by

kak

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

, and the Frobenius norm of a matrix

by kAk

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

. We denote the vector of all ones

by 1,andtheidentitymatrixbyI.

A. Probabilistic Knowledge Graphs

We now introduce some mathematical background so

we can more formally define statistical models for

knowledge graphs.

Let E¼fe

; ...; e

g be the set of all entities and

R¼fr

; ...; r

g be the set of all relation types in a

knowledge graph. We model each possible triple

ijk

¼ðe

; r

; e

Þ over this set of entities and relations as a

binary random variable y

ijk

2f0; 1g that indicates its

existence. All possible triples in ERE can be

grouped naturally in a third-order tensor (three-way array)

Y 2f0; 1g

N

, whose entries are set such that

ijk

1; if the triple ðe

; r

; e

Þ exists

0; otherwise.



Wewillrefertothisconstructionasanadjacencytensor

(cf., Fig. 3). Each possible realization of

Y can be

interpreted as a possible world. To derive a model for the

entire knowledge graph, we are then interested in

estimating the joint distribution Pð

YÞ, from a subset

DEREf0; 1g of observed triples. In doing so,

we are estimating a probability distribution over possible

worlds, which allows us to predict the probability of

triples based on the state of the entire knowledge graph.

While y

ijk

¼ 1 in adjacency tensors indicates the

existence of a triple, the interpretation of y

ijk

¼ 0

depends on whether the open world, closed world, or

local-closed world assumption is made. For details, see

Section VII-B.

Note that the size of

Y can be enormous for large

knowledge graphs. For instance, in the case of Freebase,

which currently consists of over 40 million entities and

35 000 relations, the number of possible triples jEREj

exceeds 10

elements. Of course, type constraints reduce

this number considerably.

Even amongst the syntactically valid triples, only a tiny

fraction are likely to be true. For example, there are over

450 000 thousands actors and over 250 000 movies stored

in Freebase. But each actor stars only in a small number of

movies. Therefore, an important issue for SRL on

knowledge graphs is how to deal with the large number

ofpossiblerelationshipswhileefficientlyexploitingthe

sparsity of relationships. Ideally, a relational model for

large-scale knowledge graphs should scale at most linearly

with the data size, i.e., linearly in the number of entities

Fig. 3. Tensor representation of binary relational data.

Nickel et al.: A Review of Relational Machine Learning for Knowledge Graphs

Vol. 104, No. 1, January 2016 | Proceedings of the IEEE 15

剩余22页未读，继续阅读

Lnrd_L

粉丝: 5
资源: 11

会员权益专享

A Review of Relational Machine Learning for Knowledge Graph

评论0

会员权益专享

最新资源

A Review of Relational Machine Learning for Knowledge Graph

评论0

A Review of Relational Machine Learning for Knowledge Graphs

KGAT：Knowledge Graph Attention Network for Recommendation.pdf

Encyclopedia of Machine Learning and Data Mining 2nd.Edition.pdf

知识图谱英语参考文献

推荐一篇很好讲解知识表示模型ComplEx

GRAPH-RELATIONAL DOMAIN ADAPTATION

relational inductive biases, deep learning, and graph networks

deep relational reasoning graph network for arbitrary shape text detection

a relational model of data for

relational graph attention networks

sqlserver 18

create a relational database schema with SQL statement.

列出图神经网络的最新文献

comparison of unsigned expression

mybatis-plus

geometry(see"PostGIS support")

entity framework 6 (ef6) is a tried and tested object-relational mapper for

relational knowledge distillation

请为我找到几个基于spring boot的优秀开源项目

Could not set parameters for mapping: ParameterMapping{property='xxx', mode=XX, ······}

会员权益专享

最新资源