知识图谱完成：现状与技术综述

需积分: 10 58 浏览量更新于2024-07-15 收藏 2.73MB PDF 举报

知识图谱完成（Knowledge Graph Completion, KGC）是当前知识图谱构建与相关应用中的研究热点，其目标在于通过预测知识图谱中缺失的实体或关系，挖掘未知事实，从而丰富和完整知识图谱的结构。这项工作在陕西长安大学信息工程学院的研究团队的支持下进行，由作者Zhechen、Yuehan Wang、Binzao、Jingcheng、Xinzao和Zongtao Duan共同完成，部分资助来源于陕西省科技厅的关键技术研发计划项目。 KGC的定义涵盖了多种类型，包括实体链接（Entity Linking）、关系抽取（Relationship Extraction）、属性填充（Property Inference）等。这些任务旨在解决知识图谱中的信息不完整性问题，对于提升搜索引擎、推荐系统和自然语言处理等领域的能力至关重要。 KGC技术主要分为传统方法和基于表示学习的方法两大类。传统方法通常依赖于规则、统计模型或推理规则，例如基于路径的规则、基于模板的方法和基于逻辑推理的模型。这些方法依赖于预先定义的规则库或领域专家知识，虽然准确度可能受到规则完备性和领域知识限制，但执行效率相对较高。另一方面，基于表示学习的方法（如TransE、TransH、TransR、DistMult和ComplEx等）则利用神经网络模型来学习实体和关系的嵌入表示，通过计算潜在向量之间的相似度来推断缺失的关系。这些模型利用大规模的无监督学习，能够在没有显式规则的情况下捕捉实体和关系之间的复杂关系，从而在性能上超越了传统方法，尤其是在大规模数据集上表现出色。近年来，随着深度学习的发展，越来越多的模型被提出，如Graph Neural Networks (GNNs) 和 Transformer架构在KGC中展现出强大的潜力，它们能够处理更复杂的图结构，并且可以自适应地学习节点和边的表示。此外，还出现了结合传统方法和深度学习的优势的混合模型，以及在KGC中考虑实体间的动态变化和时间序列信息的时序知识图谱模型。总结来说，知识图谱完成是一个多角度、多层次的研究领域，涵盖了理论方法、模型设计、数据挖掘和应用实践等多个环节。随着技术的不断进步，KGC将继续推动人工智能的发展，为智能问答、信息检索、推荐系统等应用场景提供更为精准和全面的知识支持。

Z. Chen et al.: KGC: A Review

the speciﬁc relationships that may have a common path,

and divide the relationships into different groups. By this

way, it improved the one-to-one modeling method of PRA

for training a separate classiﬁer for each relationship. The

graph-based knowledge graph completion method has three

problems as follows: Firstly, the scalability is poor and the

memory usage is high, because for a group of entity pairs, this

type of algorithm requires enumerating paths to determine

whether there exists all possible relationships between the

entity pairs. Secondly, the number of paths is large, and

using the path as a model training feature may cause fea-

ture explosion. Finally, like the completion method based on

the probability graph model, the graph-based model is also

facing the problem of high complexity of large scale data

computation.

Traditional knowledge graph completion methods apply

the reasoning rules and the network structure of knowledge

graph. With the expansion of knowledge graph, the defects of

this kind of method are gradually manifest. Firstly, the expan-

sion of knowledge graph gradually reﬂects the sparsity of

data, increases the difﬁculty of extracting rules, and long

tail distribution entities associated knowledge is less. So the

above methods are greatly limited in the aspect of knowledge

graph completion; Secondly, the essence of knowledge graph

data is a kind of semantic network, in which entities and

relationships contain rich semantic information [50]. How-

ever, it is difﬁcult to obtain high-quality knowledge graph

because the traditional knowledge graph representation meth-

ods cannot encode semantic information. Finally, the tradi-

tional knowledge graph completion method has the problem

of computational efﬁciency, high algorithm complexity, poor

portability and scalability. Based on this, the study of knowl-

edge graph completion has shifted to the stage of knowledge

representation learning.

B. MAIN METHODS OF KNOWLEDGE

GRAPH COMPLETION BASED ON

REPR ESENTATION LEARNING

Because the knowledge graph is a multi-relational graph com-

posed of entities (nodes) and relationships (different types

of edges), it is usually organized in the form of a network.

For example, the knowledge graph stored based on resource

description framework (RDF) [51] is represented in triples.

However, the knowledge graph representation based on net-

work exists lots of problems in application, mainly including

the following two aspects: First, the calculation efﬁciency.

In the knowledge representation based on network graph,

entities are expressed as different nodes. When calculating

the semantic or reasoning the relationships between entities,

it is necessary for speciﬁc application to design special graph

algorithm to implement this representation. This method is

poor in ﬂexibility and scalability, which is difﬁcult to meet

the demand of the current large-scale knowledge graph calcu-

lation. Second, data sparsity problem. Similar to other types

of large-scale data, large-scale knowledge graphs also follow

long-tail distribution. The entities and relationships of the

long-tail distribution face serious data sparsity problem [28].

For this problem, extensive attention has been turned to

knowledge representation learning [52]–[57] in recent years.

Through machine learning, knowledge representation learn-

ing aims to express semantic information such as entities

and relationships as dense low-dimensional real value vec-

tors in a continuous vector space, which not only preserves

the inherent graph structure of knowledge graph, but also

simpliﬁes operations. Typical knowledge representation

learning techniques generally include the following three

parts: 1) Represent relationships and entities in a continu-

ous space; 2) Deﬁne the score function f

(h, t) to judge the

probability of the establishment of triples (h, r, t). The main

difference between models lies in the difference of the score

function; 3) Learn the representation of entities and relation-

ships, and solve the optimization problem of maximizing the

rationality of visible facts. Through the efﬁcient computation

of semantic relations between entities and relationships in

low-dimensional space, the problem of data sparsity is effec-

tively solved, and the effect of knowledge graph completion

is signiﬁcantly improved. The following will introduce the

knowledge graph completion methods based on different rep-

resentation learning models.

1) KNOWLEDGE GRAPH COMPLETION METHOD BASED ON

TRANSLATION MODEL

Translation model is the most representative classical method

in knowledge representation learning. In 2013, Mikolov et al.

proposed Word2Vec [54] algorithm for the ﬁrst time, and

thus proposed the translation invariant phenomenon of word

vector, such as titanic-leonardodicaprio ≈ 2012-johncusack,

that is, distribution based word representation captures some

kind of same semantic relationship. According to the transla-

tion invariance phenomenon, Bordes et al. proposed the most

representative classical translation model TransE [55], and

led a large number of researchers into the study of Trans series

models, in which the representative improved models include

TransH [56], TransR [6] and TransD [57]. The main idea

behind the translation model is to treat the process of ﬁnding

valid triples as the translation operation of entities through

relationships, deﬁne the corresponding score function, and

then minimize the loss function to learn the representation

of entities and relationships.

Given a training set S consisting of triples (h, r, t), in the

head and tail entity h, t ∈ E, E is entity set, and r ∈ R, R is

relationship set. The main idea of TransE is that, if the triplet

(h, r, t) is true, then the sum of the vector representations of

head entity and relation is close to the vector representations

of the tail entity; otherwise, it is far away, that is, when the

triplet is formed, h+r≈t, as shown in FIGURE 1. From the

above ideas, the score function f

(h, t) = −

h + r − t

[55] of the TransE model can be obtained, which represents

the Euclidean distance between the head entity and the tail

entity in low-dimensional continuous space.

TransE model is efﬁcient, concise and has good predic-

tion effect, but there are two problems: 1) TransE uses the

VOLUME 8, 2020 192439

剩余21页未读，继续阅读

SKSZ233

粉丝: 13

知识图谱完成：现状与技术综述

Learning Sequence Encoders for Temporal Knowledge Graph Completion.pdf

TuckER：Tensor Factorization for Knowledge Graph Completion.pdf

Knowledge Representation Learning：A Quantitative Review.pdf

PyCharm vs. Other IDEs: A Comprehensive Analysis to Help You Make the Best Choice

Model Performance Benchmarking: How to Establish a Fair Comparison Platform

vue.js v2.5.17

DM8-SQL语言详解及其数据管理和查询操作指南

1108_ba_open_report.pdf

anslow_02_0109.pdf

以下是OpenCV在不同操作系统下的下载与安装教程

最新资源