知识图谱嵌入链接预测：方法比较与设计策略

需积分: 43 32 浏览量更新于2024-07-15 收藏 3.63MB PDF 举报

随着知识图谱（Knowledge Graphs, KGs）在工业和学术领域的广泛应用，其在信息抽取和知识表示学习方面的重要性日益凸显。然而，尽管研究者们致力于从多种来源收集和整合数据，构建出规模庞大的知识库，但不可避免的是，即便是最先进的KG也存在不完备性问题。这就是所谓的链接预测（Link Prediction, LP），即通过分析已知实体之间的关系，推断出知识图谱中缺失的联系。近年来，基于知识图谱嵌入（Knowledge Graph Embedding, KGE）的方法在链接预测任务上取得了显著的进展。KGE技术通过将实体和关系映射到低维向量空间中，使得在该空间中的相似性可以反映实际的语义关联。常见的KGE方法有TransE、TransH、TransR、DistMult等，它们各自有着不同的假设和优化目标，如TransE假设实体和关系是简单的三元组组合，而DistMult则考虑了关系间的共轭性质。尽管KGE在链接预测中的性能卓越，但这一领域的研究还存在一些关键问题。首先，对于各种设计选择的评估不够充分。不同的KGE模型可能在特定场景下表现出色，但在跨模型比较时，如何公正地评价它们的优劣并未得到充分讨论。其次，目前的标准实践是通过汇总大量测试事实来报告准确率，但这可能导致某些实体过度代表，从而影响评估结果的可靠性。为了推动KGE在链接预测领域的进一步发展，研究人员需要进行更深入的比较分析，包括对不同模型的理论基础、性能瓶颈以及在实际应用场景中的鲁棒性和泛化能力进行探讨。此外，探索更有效的度量指标和评估方法，比如F1分数、AUC-ROC曲线，以及结合领域知识的自适应评估，都是未来研究的重要方向。基于知识图谱嵌入的链接预测是一项挑战性任务，它不仅需要创新的模型设计，还需要更加精细的评估策略。随着对KG不完备性的持续关注和解决，KGE将在知识发现、推荐系统以及自然语言处理等领域发挥更大的作用。

8 Rossi, et al.

of L2 norm).

ϕ(h, r, t ) = δ(τ(h, r), t )

Depending on the analytical form of

, Geometric models may share similarities with Tensor Decomposition models,

but in these cases geometric models usually need to enforce additional constraints in order to make their

implement a

valid spatial transformation. For instance, the rotation operated by model RotatE can be formulated as a matrix product,

but the rotation matrix would need to be diagonal and to have elements with modulus 1.

Much like with Matrix Factorization Models, these systems usually avoid shared parameters, running back-propagation

directly on the embeddings. We identify three groups in this family: (i) Pure Translational Models, (ii) Translational

Models with Additional Embeddings, and (iii) Roto-translational models.

3.2.1 Pure Translational Models. ese models interpret each relation as a translation in the latent space: the relation

embedding is just added to the head embedding, and we expect to land in a position close to the tail embedding. ese

models thus represent entities and relations as one-dimensional vectors of same length.

TransE

[

] was the rst LP model to propose a geometric interpretation of the latent space, largely inspired by the

capability observed in Word2vec vectors [

] to capture relations between words in the form of translations between

their embeddings. TransE enforces this explicitly, requiring that the tail embedding lies close to the sum of the head

and relation embeddings, according to the chosen distance function. Due to the nature of translation, TransE is not able

to correctly handle one-to-many and many-to-one relations, as well as symmetric and transitive relations.

3.2.2 Translational models with Additional Embeddings. ese models may associate more than one embedding

to each KG element. is oen amounts to using specialized embeddings, such as relation-specic embeddings for

each entity or, vice-versa, entity-specic embeddings for each relation. As a consequence, these models overcome the

limitations of purely translational models at the cost of learning a larger number of parameters.

STransE

[

], in addition to the

-sized embeddings seen in TransE, associates to each relation

two additional

d × d

independent matrices

and

. When computing the score of a fact

hh, r , ti

, before operating the usual translation,

is pre-multiplied by

and

; this amounts to use relation-specic embeddings for the head and tail, alleviating

the issues suered by TransE on 1-to-many, many-to-one and many-to-many relations.

CrossE

[

] is one of the most recent and also most eective models in this group. For each relation it learns an

additional relation-specic embedding

. Given any fact

hh, r , ti

, CrossE uses element-wise products (denoted by



Table 1) to combine

and

with

. is results in triple-specic embeddings, dubbed interaction embeddings, that

are then used in the translation. Interestingly, despite not relying on neural layers, this model adopts the common deep

learning practice to interpose operations with non-linear activation functions, such as hyperbolic tangent and sigmoid

denoted (denoted respectively by tanh and σ in Table 1).

3.2.3 Roto-Translational Models. ese models include operations that are not directly expressible as pure transla-

tions: this oen amounts to perform rotation-like transformations either in combination or in alternative to translations.

TorusE

[

] was motivated by the observation that the regularization used in TransE forces entity embeddings to lie

on a hypersphere, thus limiting their capability to satisfy the translational constraint. To solve this problem, TorusE

projects each point

of the traditional open manifold

into a

[x]

point on a torus

. e authors dene torus

Manuscript submied to ACM

Knowledge Graph Embedding for Link Prediction: A Comparative Analysis 9

distance functions

and

eL2

, corresponding to L1, L2 and squared L2 norm respectively (we report in Table 1

the scoring function with the extended form of d

RotatE

[

] represents relations as rotations in a complex latent space, with

and

all belonging to

. e

embedding is a rotation vector: in all its elements, the complex component conveys the rotation along that axis, whereas

the real component is always equal to 1. e rotation

is applied to

by operating an element-wise product (once

again noted with



in 1). L1 norm is used for measuring the distance from

. e authors demonstrate that rotation

allows to model correctly numerous relational paerns, such as symmetry/anti-symmetry, inversion and composition.

3.3 Deep Learning Models

Deep Learning Models use deep neural networks to perform the LP task. Neural Networks learn parameters such

as weights and biases, that they combine with the input data in order to recognize signicant paerns. Deep neural

networks usually organize parameters into separate layers, generally interspersed with non-linear activation functions.

In time, numerous types of layers have been developed, applying very dierent operations to the input data. Dense

layers, for instance, will just combine the input data

with weights

and add a bias

W × X + B

. For the sake of

simplicity, in the following formulas we will not mention the use of bias, keeping it implicit. More advanced layers

perform more complex operations, such as convolutional layers, that learn convolution kernels to apply to the input

data, or recurrent layers, that handle sequential inputs in a recursive fashion.

In the LP eld, KG embeddings are usually learned jointly with the weights and biases of the layers; these shared

parameters make these models more expressive, but potentially heavier, harder to train, and more prone to overing.

We identify three groups in this family, based on the neural architecture they employ: (i) Convolutional Neural Networks,

(ii) Capsule Neural Networks, and (iii) Recurrent Neural Networks.

3.3.1 Convolutional Neural Networks. ese models use one or multiple convolutional layers [

]: each of these

layers performs convolution on the input data (e.g. the embeddings of the KG elements in a training fact) applying

low-dimensional lters

. e result is a feature map that is usually then passed to additional dense layers in order to

compute the fact score.

ConvE

[

] represents entities and relations as one-dimensional

-sized embeddings. When computing the score of

a fact, it concatenates and reshapes the head and relation embeddings

and

into a unique input

;

r ]

; we dub the

resulting dimensions

× d

. is input is let through a convolutional layer with a set

m × n

lters, and then

through a dense layer with

neurons and a set of weights

. e output is nally combined with the tail embedding

using dot product, resulting in the fact score. When using the entire matrix of entity embeddings instead of the

embedding of just the one target entity t, this architecture can be seen as a classier with |E| classes.

ConvKB

[

] models entities and relations as same-sized one-dimensional embeddings. Dierently from ConvE, given

any fact

hh, r , ti

, it concatenates all their embeddings

and

into a

d ×

3 input matrix

;

t ]

. is input is passed

to a convolutional layer with a set

lters of shape 1

3, resulting in a

T ×

3 feature map. e feature map is let

through a dense layer with only one neuron and weights

, resulting in the fact score. is architecture can be seen as

a binary classier, yielding the probability that the input fact is valid.

ConvR

[

] represents entity and relation embeddings as one-dimensional vectors of dierent dimensions

and

For any fact

hh, r , ti

is rst reshaped into a matrix of shape

, d

, where

× d

= d

is then reshaped

Manuscript submied to ACM

剩余42页未读，继续阅读

syp_net

粉丝: 158
资源: 1187

知识图谱嵌入链接预测：方法比较与设计策略

知识图谱嵌入表示的几何形状理解 acl2018.pdf

基于贝叶斯网的知识图谱链接预测

TranSE算法实现及测试

20190621_知识图谱技术（C会场）_知识图谱构建：数据、算法和架构_王冠 瑞士再保险数据科学家.pdf

知识图谱构建方法研究.docx

深度学习中的知识表示与应用综述.pdf

基于Matlab的矩阵乘法实现与链接预测研究

Python动态网络嵌入技术分析

GraphSAGE节点分类在知识图谱构建中的妙用：构建高质量知识图谱，揭示知识关联

图神经网络与知识图谱融合的探索

最新资源

20190621_知识图谱技术（C会场）_知识图谱构建：数据、算法和架构_王冠瑞士再保险数据科学家.pdf