没有合适的资源?快使用搜索试试~ 我知道了~
首页A Review of Relational Machine Learning for Knowledge Graph
资源详情
资源评论
资源推荐
INVITED
PAPER
A Review of Relational Machine
Learning for Knowledge Graphs
This paper reviews how statistical models can be ‘‘trained’’ on large knowledge graphs
and then used to predict new facts about the world.
By Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich
ABSTRACT
|
Relational machine learning studies methods for
the s tatistical analysis of relational, or graph-structured, data.
In this paper, we provide a review of how such statistical
models can be ‘‘trained’’ on large knowledge graphs, and then
used to predict new facts about the world (which is equivalent
to predicting new edges in the graph). In particular, we discuss
two fundamentally different kinds of statistical relational
models, both of which can scale to massive dat a sets. The first
is based on latent feature models such as tensor factorization
and multiway neural networks. The second is based on mining
observable patterns in the graph. We also show how to
combine these latent and observable models to get improved
modeling power at decreased computational cost. Finally, we
discuss how such statistical models of graphs can be combined
with text-based information extraction methods for automat-
ically constructing knowledge graphs from the Web. To this
end, we also discuss Google’s knowledge vault project as an
example of such combination.
KEYWORDS
|
Graph-based mode ls; knowl edge ext raction;
knowledge graphs; latent feature models; statistical relational
learning
I. INTRODUCTION
‘‘I am convinced that the crux of the problem of
learning is recognizing relationships and being able
to use them’’VChristopher Strachey in a letter to Alan
Turing, 1954.
Traditional machine learning algorithms take as input a
feature vector, which represents an object in terms of
numeric or categorical attributes. The main learning task
is to learn a mapping from this feature vector to an
output prediction of some form. This could be class
labels, a regression score, or an unsupervised cluster id
or latent vector (embedding). In statistical relational
learning (SRL), the representation of an object can
contain its relationships to other objects. Thus the data is
in the form of a graph, consisting of nodes (entities) and
labeled edges (relationships between entities). The main
goals of SRL include prediction of missing edges,
prediction of properties of nodes, and clustering nodes
based on their connectivity patterns. These tasks arise in
many settings such as analysis of social networks and
biological pathways. For further information on SRL,
see [1]–[3].
In this paper, we review a variety of techniques from
the SRL community and explain how they can be applied to
large-scale knowledge graphs (KGs), i.e., graph structured
knowledge bases (KBs) that store factual information in
form of relationships between entities. Recently, a large
number of knowledge graphs have been created, including
YAGO [4], DBpedia [5], NELL [6], Freebase [7], and the
Google Knowledge Graph [8]. As we discuss in Section II,
these graphs contain millions of nodes and billions of
edges. This causes us to focus on scalable SRL techniques,
which take time that is (at most) linear in the size of
the graph.
We can apply SRL methods to existing KGs to learn a
model that can predict new facts (edges) given existing
facts. We can then combine this approach with informa-
tion extraction methods that extract ‘‘noisy’’ facts from the
Web (see, e.g., [9] and [10]). For example, suppose an
information extraction method returns a fact claiming that
Barack Obama was born in Kenya, and suppose (for
illustration purposes) that the true place of birth of Obama
was not already stored in the knowledge graph. An SRL
model can use related facts about Obama (such as his
Manuscript received April 8, 2015; revised August 14, 2015; accepted September 16,
2015. Date of current version December 18, 2015. The work of M. Nickel was supported by
the Center for Brains, Minds and Machines (CBMM) under NSF STC award CCF-1231216.
The work of V. Tresp was supported by the German Federal Ministry for Economic Affairs
and Energy under the ‘‘Smart Data’’ technology program (Grant 01MT14001).
M. Nickel is with the Laboratory for Computational and Statistical Learning (LCSL),
Massachusetts Institute of Technology, Cambridge, MA 02139 USA and also with the
Istituto Italiano di Tecnologia, 16163 Genova, Italy (e-mail: mnick@mit.edu).
K. Murphy and E. Gabrilovich are with Google Inc., Mountain View, CA 94043 USA.
V. Tresp is with Siemens AG, Corporate Technology, Munich 81739, Germany and also
with the Ludwig Maximilian University of Munich, Munich 80539, Germany.
Digital Object Identifier: 10.1109/JPROC.2015.2483592
0018-9219
2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Vol. 104, No. 1, January 2016 | Proceedings of the IEEE 11
profession being U.S. President) to infer that this new fact
is unlikely to be true and should be discarded. This
provides us a way to ‘‘grow’’ a KG automatically, as we
explaininmoredetailinSectionIX.
The remainder of this paper is structured as follows. In
Section II, we introduce knowledge graphs and some of
their properties. Section III discusses SRL and how it can
be applied to knowledge graphs. There are two main
classes of SRL techniques: those that capture the
correlation between the nodes/edges using latent vari-
ables, and those that capture the correlation directly using
statistical models based on the observable properties of the
graph.WediscussthesetwofamiliesinSectionsIVandV,
respectively. Section VI describesmethodsforcombining
these two approaches, in order to get the best of both
worlds. Section VII discusses how such models can be
trainedonKGs.InSectionVIIIwediscussrelational
learning using Markov Random Fields. In Section IX, we
describe how SRL can be used in automated knowledge
base construction projects. In Section X, we discuss
extensions of the presented methods, and Section XI
presents our conclusions.
II. KNOWLEDGE GRAPHS
In this section, we introduce knowledge graphs, and
discuss how they are represented, constructed, and used.
A. Knowledge Representation
Knowledge graphs model information in the form of
entities and relationships between them. This kind of
relational knowledge representation has a long history in
logic and artificial intelligence [11], for example, in semantic
networks [12] and frames [13]. More recently, it has been
used in the Semantic Web community with the purpose of
creating a ‘‘web of data’’ that is readable by machines [14].
While this vision of the Semantic Web remains to be fully
realized, parts of it have been achieved. In particular, the
concept of linked data [15], [16] has gained traction, as it
facilitates publishing and interlinking data on the Web in
relational form using the W3C Resource Description
Framework (RDF) [17], [18]. (For an introduction to
knowledge representation, see, e.g., [11], [19], and [20].)
In this paper, we will loosely follow the RDF standard
and represent facts in the form of binary relationships, in
particular (subject, predicate, object) (SPO) triples, where
subject and object are entities and predicate is the relation
between them. (We discuss how to represent higher arity
relations in Section X-A.) The existence of a particular
SPO triple indicates an existing fact, i.e., that the
respective entities are in a relationship of the given type.
For instance, the information
‘‘Leonard Nimoy was an actor who played the
character Spock in the science-fiction movie
Star Trek’’
can be expressed via the following set of SPO triples:
We can combine all the SPO triples together to form a
multigraph, where nodes represent entities (all subjects
and objects), and directed edges represent relationships.
The direction of an edge indicates whether entities occur
as subjects or objects, i.e., an edge points from the subject
to the object. Different relations are represented via
different types of edges (also called edge labels). This
construction is called a knowledge graph (KG), or
sometimes a heterogeneous information network [21].)
See Fig. 1 for an example.
In addition to being a collection of facts, knowledge
graphs often provide type hierarchies (Leonard Nimoy is
an actor, which is a person, which is a living thing) and
type constraints (e.g., a person can only marry another
person, not a thing).
B. Open Versus Closed World Assumption
While existing triples always encode known true
relationships (facts), there are different paradigms for
the interpretation of nonexisting triples.
• Under the closed world assumption (CWA),
nonexisting triples indicate false relationships.
For example, the fact that in Fig. 1 there is no
starredIn edge from Leonard Nimoy to Sta r Wars is
interpreted to mean that Nimoy definitely did not
star in this movie.
• Under the open world assumption (OWA), a
nonexisting triple is interpreted as unknown, i.e.,
the corresponding relationship can be either true
or false. Continuing with the above example, the
missing edge is not interpreted to mean that
Nimoy did not star in Star Wars.Thismore
cautious approach is justified, since KGs are known
Fig. 1. Sample knowledge graph. Nodes represent entities, edge
labels represent types of relations, and edges represent existing
relationships.
Nickel et al.: A Review of Relational Machine Learning for Knowledge Graphs
12 Proceedings of the IEEE |Vol.104,No.1,January2016
to be very incomplete. For example, sometimes
just the main actors in a movie are listed, not the
complete cast. As another example, note that even
the place of birth attribute, which you might think
would be typically known, is missing for 71% of all
people included in Freebase [22].
RDF and the Semantic Web make the open-world
assumption. In Section VII-B we also discuss the local
closed world assumption (LCWA), which is often used for
training relational models.
C. Knowledge Base Construction
Completeness, accuracy, and data quality are important
parameters that determine the usefulness of knowledge
bases and are influenced by the way knowledge bases are
constructed. We can classify KB construction methods into
four main groups:
• in curated approaches, triples are created manually
by a closed group of experts;
• in collaborative approaches, triples are created
manually by an open group of volunteers;
• in automated semistructured approaches, triples
are extracted automatically from semistructured
text (e.g., infoboxes in Wikipedia) via hand-crafted
rules, learned rules, or regular expressions;
• in automated unstructured approaches, triples are
extracted automatically from unstructured text via
machine learning and natural language processing
techniques (see, e.g., [9] for a review).
Construction of curated knowledge bases typically
leads to highly accurate results, but this technique does not
scale well due to its dependence on human experts.
Collaborative knowledge base construction, which was
used to build Wikipedia and Freebase, scales better but still
has some limitations. For instance, as mentioned previ-
ously, the place of birth attribute is missing for 71% of all
people included in Freebase, even though this is a
mandatory property of the schema [22]. Also, a recent
study [35] found that the growth of Wikipedia has been
slowing down. Consequently, automatic knowledge base
construction methods have been gaining more attention.
Such methods can be grouped into two main
approaches. The first approach exploits semistructured
data, such as Wikipedia infoboxes, which has led to large,
highly accurate knowledge graphs such as YAGO [4], [27]
and DBpedia [5]. The accuracy (trustworthiness) of facts in
such automatically created KGs is often still very high. For
instance, the accuracy of YAGO2 has been estimated
1
to be
over 95% through manual inspection of sample facts [36],
and the accuracy of Freebase [7] was estimated to be 99%.
2
However, semistructured text still covers only a small
fraction of the information stored on the Web, and
completeness (or coverage) is another important aspect of
KGs. Hence the second approach tries to ‘‘read the Web,’’
extracting facts from the natural language text of Web
pages. Example projects in this category include NELL [6]
and the knowledge vault [28]. In Section IX, we show how
we can reduce the level of ‘‘noise’’ in such automatically
extracted facts by using the knowledge from existing, high-
quality repositories.
KGs, and more generally KBs, can also be classified
based on whether they employ a fixed or open lexicon of
entities and relations. In particular, we distinguish two
main types of KBs.
• In schema-based approaches, entities, and rela-
tions are represented via globally unique identi-
fiers and all possible relations are predefined in a
fixed vocabulary. For example, Freebase might
represent the fact that Barack Obama was born in
Hawaii using the triple (/m/02mjmr,/people/
person/born-in,/m/03gh4), where /m/02mjmr is
the unique machine ID for Barack Obama.
• In schema-free approaches, entities and relations
are identified using open information extraction
(OpenIE) techniques [37], and represented via
normalized but not disambiguated strings (also
referred to as surface names). For example, an
OpenIE system may contain triples such as
(‘‘Obama,’’ ‘‘born in,’’ ‘‘Hawaii’’), (‘‘Barack Obama,’’
‘‘place of birth,’’ ‘‘Honolulu’’), etc. Note that it is not
clear from this representation whether the first
triple refers to the same person as the second triple,
nor whether ‘‘born in’’ means the same thing as
‘‘place of birth.’’ This is the main disadvantage of
OpenIE systems.
Table 1 lists current knowledge base construction
projects classified by their creation method and data
schema.Inthispaper,wewillonlyfocusonschema-based
KBs. Table 2 shows a selection of such KBs and their sizes.
D. Uses of Knowledge Graphs
Knowledge graphs provide semantically structured
information that is interpretable by computersVa
1
For detailed statistics, see http://www.mpi-inf.mpg.de/departments/
databases-and-information-systems/research/yago-naga/yago/statistics/.
2
http://thenoisychannel.com/2011/11/15/cikm-2011-industry-event-
john-giannandrea-on-freebase-a-rosetta-stone-for-entities
Table 1
Knowledge Base Construction Projects
Nickel et al.: A Review of Relational Machine Learning for Knowledge Graphs
Vol. 104, No. 1, January 2016 | Proceedings of the IEEE 13
property that is regarded as an important ingredient to
build more intelligent machines [38]. Consequently, knowl-
edge graphs are already powering multiple ‘‘Big Data’’
applications in a variety of commercial and scientific
domains. A prime example is the integration of Google’s
Knowledge Graph, which currently stores 18 billion facts
about 570 million entities, into the results of Google’s search
engine [8]. The Google Knowledge Graph is used to identify
and disambiguate entities in text, to enrich search results
with semantically structured summaries, and to provide
links to related entities in exploratory search. (Microsoft has
a similar KB, called Satori, integrated with its Bing search
engine [39].)
Enhancing search results with semantic information
from knowledge graphs can be seen as an important step to
transform text-based search engines into semantically
aware question answering services. Another prominent
example demonstrating the value of knowledge graphs is
IBM’s question answering system Watson, which was able
to beat human experts in the game of Jeopardy!.Among
others, this system used YAGO, DBpedia, and Freebase as
its sources of information [40]. Repositories of structured
knowledge are also an indispensable component of digital
assistants such as Siri, Cortana, or Google Now.
Knowledge graphs are also used in several specialized
domains. For instance, Bio2RDF [41], Neurocommons
[42], and LinkedLifeData [43] are knowledge graphs that
integrate multiple sources of biomedical information.
These have been used for question answering and decision
support in the life sciences.
E. Main Tasks in Knowledge Graph Construction and
Curation
In this section, we review a number of typical KG tasks.
Link prediction is concerned with predicting the existence
(or probability of correctness) of (typed) edges in the
graph (i.e., triples). This is important since existing
knowledge graphs are often missing many facts, and
some of the edges they contain are incorrect [44]. In the
context of knowledge graphs, link prediction is also
referred to as knowledge graph completion. For example,
in Fig. 1, suppose the characterIn edge from Obi-Wan
Kenobi to Star Wars were missing; we might be able to
predict this missing edge, based on the structural similarity
between this part of the graph and the part involving Spock
and Star Trek. It has been shown that relational models
that take the relationships of entities into account can
significantly outperform nonrelational machine learning
methods for this task (e.g., see [45] and [46]).
Entity resolution (also known as record linkage [47],
object identification [48], instance matching [49], and
deduplication [50]) is the problem of identifying which
objects in relational data refer to the same underlying
entities. See Fig. 2 for a small example. In a relational
setting, the decisions about which objects are assumed to
be identical can propagate through the graph, so that
matching decisions are made collectively for all objects in a
domain rather than independently for each object pair
(see, for example, [51]–[53]). In schema-based automated
knowledge base construction, entity resolution can be used
to match the extracted surface names to entities stored in
the knowledge graph.
Fig. 2. Example of entity resolution in a toy knowledge graph. In this example, nodes 1 and 3 refer to the identical entity, the actor Alec Guinness.
Node 2, on the other hand, refers to Arthur Guinness, the founder of the Guinness brewery. The surface name of node 2 (‘‘A. Guinness’’) alone
would not be sufficient to perform a correct matching as it could refer to both Alec Guinness and Arthur Guinness. However, since links in the
graph reveal the occupations of the persons, a relational approach can perform the correct matching.
Table 2 Size of Some Schema-Based Knowledge Bases
Nickel et al.: A Review of Relational Machine Learning for Knowledge Graphs
14 Proceedings of the IE EE |Vol.104,No.1,January2016
Link-based clustering extends feature-based clustering
to a relational learning setting and groups entities in
relational data based on their similarity. However, in link-
based clustering, entities are not only grouped by the
similarity of their features but also by the similarity of their
links. As in entity resolution, the similarity of entities can
propagate through the knowledge graph, such that
relational modeling can add important information for
this task. In social network analysis, link-based clustering
is also known as community detection [54].
III. STATISTICAL RELATIONAL
LEARNING FOR KNOWLEDGE GRAPHS
Statistical relational learning is concerned with the
creation of statistical models for relational data. In the
following sections we discuss how statistical relational
learning can be applied to knowledge graphs. We will
assume that all the entities and (types of) relations in a
knowledge graph are known. (We discuss extensions of
this assumption in Section X-C). However, triples are
assumed to be incomplete and noisy; entities and relation
types may contain duplicates.
Notation: Before proceeding, let us define our mathe-
matical notation. (Variable names will be introduced later
in the appropriate sections.) We denote scalars by lower
case letters, such as a; column vectors (of size N)bybold
lower case letters, such as a; matrices (of size N
1
N
2
)by
bold upper case letters, such as A;andtensors(ofsize
N
1
N
2
N
3
) by bold upper case letters with an
underscore, such as
A.Wedenotethek’th ‘‘frontal slice’’
of a tensor
A by A
k
(which is a matrix of size N
1
N
2
),
and the ði; j; kÞth element by a
ijk
(which is a scalar). We use
½a; b to denote the vertical stacking of vectors a and b, i.e.,
½a; b¼
a
b
. We can convert a matrix A of size N
1
N
2
into a vector a of size N
1
N
2
by stacking all columns of A,
denoted a ¼ vecðAÞ. The inner (scalar) product of two
vectors (both of size N)isdefinedbya
>
b ¼
P
N
i¼1
a
i
b
i
.
The tensor (Kronecker) product of two vectors (of size
N
1
and N
2
)isavectorofsizeN
1
N
2
with entries
a b ¼
a
1
b
.
.
.
a
N
1
b
0
B
@
1
C
A
. Matrix multiplication is denoted by
AB as usual. We denote the L
2
norm of a vector by
kak
2
¼
ffiffiffiffiffiffiffiffiffiffiffi
P
i
a
2
i
p
, and the Frobenius norm of a matrix
by kAk
F
¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i
P
j
a
2
ij
q
. We denote the vector of all ones
by 1,andtheidentitymatrixbyI.
A. Probabilistic Knowledge Graphs
We now introduce some mathematical background so
we can more formally define statistical models for
knowledge graphs.
Let E¼fe
1
; ...; e
N
e
g be the set of all entities and
R¼fr
1
; ...; r
N
r
g be the set of all relation types in a
knowledge graph. We model each possible triple
x
ijk
¼ðe
i
; r
k
; e
j
Þ over this set of entities and relations as a
binary random variable y
ijk
2f0; 1g that indicates its
existence. All possible triples in ERE can be
grouped naturally in a third-order tensor (three-way array)
Y 2f0; 1g
N
e
N
e
N
r
, whose entries are set such that
y
ijk
¼
1; if the triple ðe
i
; r
k
; e
j
Þ exists
0; otherwise.
Wewillrefertothisconstructionasanadjacencytensor
(cf., Fig. 3). Each possible realization of
Y can be
interpreted as a possible world. To derive a model for the
entire knowledge graph, we are then interested in
estimating the joint distribution Pð
YÞ, from a subset
DEREf0; 1g of observed triples. In doing so,
we are estimating a probability distribution over possible
worlds, which allows us to predict the probability of
triples based on the state of the entire knowledge graph.
While y
ijk
¼ 1 in adjacency tensors indicates the
existence of a triple, the interpretation of y
ijk
¼ 0
depends on whether the open world, closed world, or
local-closed world assumption is made. For details, see
Section VII-B.
Note that the size of
Y can be enormous for large
knowledge graphs. For instance, in the case of Freebase,
which currently consists of over 40 million entities and
35 000 relations, the number of possible triples jEREj
exceeds 10
19
elements. Of course, type constraints reduce
this number considerably.
Even amongst the syntactically valid triples, only a tiny
fraction are likely to be true. For example, there are over
450 000 thousands actors and over 250 000 movies stored
in Freebase. But each actor stars only in a small number of
movies. Therefore, an important issue for SRL on
knowledge graphs is how to deal with the large number
ofpossiblerelationshipswhileefficientlyexploitingthe
sparsity of relationships. Ideally, a relational model for
large-scale knowledge graphs should scale at most linearly
with the data size, i.e., linearly in the number of entities
Fig. 3. Tensor representation of binary relational data.
Nickel et al.: A Review of Relational Machine Learning for Knowledge Graphs
Vol. 104, No. 1, January 2016 | Proceedings of the IEEE 15
剩余22页未读,继续阅读
Lnrd_L
- 粉丝: 5
- 资源: 11
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- zigbee-cluster-library-specification
- JSBSim Reference Manual
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0