知识图谱嵌入的语义平滑方法

58 浏览量更新于2024-08-27 收藏 1.79MB PDF 举报

"本文探讨了如何将知识图谱（KGs）中的实体和关系嵌入到低维向量空间的问题，提出了语义平滑嵌入（SSE）方法，旨在通过考虑观察到的事实以外的内在几何结构来改进知识图谱的嵌入效果。" 在知识图谱的研究中，知识图谱由众多实体（如人物、地点、事件等）和它们之间的关系组成。这些实体和关系通常被表示为三元组（subject-predicate-object），例如（"Shakespeare"-“创作”-"Romeo and Juliet"）。为了便于计算和理解，研究人员尝试将这些实体和关系转化为低维度的向量表示，这种方法被称为知识图谱嵌入。大多数现有的知识图谱嵌入方法依赖于已知事实，即基于观察到的三元组来学习实体和关系的向量表示。这些方法的目标是确保学习到的嵌入在每个单独的事实中都是兼容的，也就是说，如果一个三元组是真实的，那么其对应实体和关系的向量组合也应该在数学上合理。然而，这样的方法可能无法充分捕捉到知识图谱的深层结构和潜在的语义联系。因此，作者提出了语义平滑嵌入（SSE）框架，该框架不仅考虑已知的事实，还试图揭示嵌入空间的内在几何结构。SSE的关键思想是利用语义相似性来平滑实体和关系的邻域，使得相似的实体或关系在向量空间中更接近，从而增强模型的泛化能力和推理性能。在SSE中，作者可能采用了某种优化策略，比如最小化相似实体之间的距离或者最大化不同实体之间的差异，同时保持与已知事实的一致性。这有助于学习到的嵌入更加有序，能够更好地反映出知识图谱中的语义关系。此外，SSE可能还考虑了关系的对称性、逆关系和方向性，以增强嵌入的结构信息。实验部分可能包括在标准知识图谱完成任务上的性能比较，例如链接预测和三元组分类，以验证SSE相对于其他现有方法的优势。作者可能会展示在各种数据集上的结果，证明SSE在捕获语义关系、提高推理准确性和降低错误率方面的卓越表现。 "语义平滑知识图嵌入"这篇研究论文提出了一种新的知识图谱嵌入方法，它通过考虑嵌入空间的语义平滑性来增强模型对知识图谱复杂结构的理解和表达能力，对于知识图谱的建模和应用具有重要的理论与实践意义。

Method Entity/Relation embeddings Energy function

TransE (Bordes et al., 2013) e, r ∈ R

f(e

, r

, e

) = ∥e

+ r

−e

∥

ℓ

/ℓ

SME (lin) (Bordes et al., 2014) e, r ∈ R

f(e

, r

, e

) =

(

+ W

+ b

)



+ W

+ b



SME (bilin) (Bordes et al., 2014) e, r ∈ R

f(e

, r

, e

) =





+ b







+ b



SE (Bordes et al., 2011)

∈

d×d

(

, r

, e

) = ∥R

−R

∥

ℓ

Table 1: Existing KG embedding models.

function deﬁnition. Three state-of-the-art embed-

ding models, namely TransE (Bordes et al., 2013),

SME (Bordes et al., 2014), and SE (Bordes et al.,

2011), are detailed below. Please refer to (Jenat-

ton et al., 2012; Socher et al., 2013; Wang et al.,

2014b; Lin et al., 2015) for other methods.

TransE (Bordes et al., 2013) represents both en-

tities and relations as vectors in the embedding s-

pace. For a given triple ⟨e

, r

, e

⟩, the relation is

interpreted as a translation vector r

so that the

embedded entities e

and e

can be connected by

with low error. The energy function is deﬁned

as f (e

, r

, e

) = ∥e

+ r

− e

∥

ℓ

/ℓ

, where

∥

ℓ

/ℓ

denotes the ℓ

-norm or ℓ

-norm.

SME (Bordes et al., 2014) also represents enti-

ties and relations as vectors, but models triples in

a more expressive way. Given a triple ⟨e

, r

, e

⟩,

it ﬁrst employs a function g

(

·, ·

)

to combine r

and e

, and g

(

·, ·

)

to combine r

and e

. Then,

the energy function is deﬁned as matching g

(

·, ·

)

and g

(

·, ·

)

by their dot product, i.e., f (e

, r

, e

) =

, e

)

, e

). There are two versions of

SME, linear and bilinear (denoted as SME (lin)

and SME (bilin) respectively), obtained by deﬁn-

ing diﬀerent g

(

·, ·

)

and g

(

·, ·

)

SE (Bordes et al., 2011) represents entities as

vectors but relations as matrices. Each relation is

modeled by a left matrix R

and a right matrix R

acting as independent projections to head and tail

entities respectively. If a triple ⟨e

, r

, e

⟩ holds,

and R

should be close to each other. The

energy function is f (e

, r

, e

) = ∥R

− R

∥

ℓ

Table 1 summarizes the entity/relation representa-

tions and energy functions used in these models.

3 Semantically Smooth Embedding

The methods introduced above perform the em-

bedding task based solely on observed facts. The

only requirement is that the learned embeddings

should be compatible within each individual fact.

However, they fail to discover the intrinsic geo-

metric structure of the embedding space. To deal

with this limitation, we introduce Semantically S-

mooth Embedding (SSE) which constrains the em-

bedding task by incorporating geometrically based

regularization terms, constructed by using addi-

tional semantic categories of entities.

3.1 Problem Formulation

Suppose we are given a KG consisting of n entities

and m relations. The facts observed are stored as

a set of triples O =



⟨e

, r

, e

⟩



. A triple ⟨e

, r

, e

⟩

indicates that entity e

and entity e

are connected

by relation r

. In addition, the entities are classi-

ﬁed into multiple semantic categories. Each entity

e is associated with a label c

indicating the cate-

gory to which it belongs. SSE aims to embed the

entities and relations into a continuous vector s-

pace which is compatible with the observed facts,

and at the same time semantically smooth.

To make the embedding space compatible with

the observed facts, we make use of the triple set O

and follow the same strategy adopted in previous

methods. That is, we deﬁne an energy function

on each candidate triple (e.g. the energy functions

listed in Table 1), and require observed triples to

have lower energies than unobserved ones (i.e. the

margin-based ranking loss deﬁned in Eq. (1)).

To make the embedding space semantically s-

mooth, we further leverage the entity category in-

formation

{

}

, and assume that entities within the

same semantic category should lie close to each

other in the embedding space. This smoothness

assumption is similar to the local invariance as-

sumption exploited in manifold learning theory

(i.e. nearby points are likely to have similar em-

beddings or labels). So we employ two manifold

learning algorithms Laplacian Eigenmaps (Belkin

and Niyogi, 2001) and Locally Linear Embed-

ding (Roweis and Saul, 2000) to model such se-

mantic smoothness, termed as LE and LLE for

short respectively.

3.2 Modeling Semantic Smoothness by LE

Laplacian Eigenmaps (LE) is a manifold learning

algorithm that preserves local invariance between

剩余10页未读，继续阅读

weixin_38524472

粉丝: 5
资源: 943

知识图谱嵌入的语义平滑方法

动态环境下的语义SLAM算法.docx

基于改进ExfuseNet模型的街景语义分割.docx

"图像嵌入到StyleGAN潜在空间：有效算法与语义图像编辑

如何在wp8.1中显示语义缩放

Word-embeddings-Visualization:使用各种算法可视化单词嵌入

PARN: 嵌入金字塔的仿射网络，破解密集语义对应难题

HTML5新特性详解：语义化与多媒体标签

HTML5与CSS3新特性概述：语义化与多媒体标签详解

深入理解GloVe模型：pytorch实现词嵌入进阶

吴恩达NLP课程：N-gram模型与词嵌入解析

最新资源