使用注意力神经网络编码知识图谱实体别名进行Wikidata实体链接

版权申诉

152 浏览量更新于2024-07-19 收藏 816KB PDF 举报

"Encoding Knowledge Graph Entity Aliases in Attentive Neural Network for Wikidata Entity Linking" 本文探讨了在知识图谱实体链接（Entity Linking, EL）中如何利用注意力神经网络（Attentive Neural Network, ANN）编码维基数据（Wikidata）实体的别名。在维基数据这样的大规模协作知识图谱中，由于依赖大众来创建信息，实体标题往往不规范、噪音多且有时冗长或不恰当。这种非标准、隐晦和非规范的实体表示方式对实现高精度和召回率的实体链接任务构成了挑战。作者Isaiah Onando Mulang等人分别来自德国的Fraunhofer IAIS、University of Bonn，Cerence GmbH, Zerotha Research，TIB Hannover以及美国的University of Dayton。他们提出了一种新方法，旨在通过在ANN中编码实体别名来改进实体链接的性能。知识图谱通常作为实体链接方法的目标实体来源，但同时也包含了诸如别名等其他相关信息。例如，实体“Obama”可能有多个别名，如“Barack Obama”，“44th President of the United States”。实体链接是自然语言处理中的关键任务，它涉及到识别文本中的命名实体并将其映射到知识图谱中的正确实体。在这个过程中，考虑到实体的别名至关重要，因为它们可以提供额外的上下文线索，帮助系统更准确地识别提及。传统的实体链接方法可能仅依赖于精确匹配实体的名称，而忽视了别名，从而降低了链接的准确性。文章中提到的注意力机制在神经网络中的应用，允许模型在处理长序列时更有效地聚焦于重要信息，这在处理实体别名时尤其有用。通过将别名编码到模型中，系统能够更好地理解文本中的实体提及，并提高链接到正确知识图谱实体的概率。这不仅提高了识别精度，也可能提升召回率，确保更多真实的实体提及被正确链接。此外，该研究还可能涉及训练和优化神经网络的策略，如使用特定的损失函数和优化算法，以及可能的数据增强技术，以处理知识图谱中多样化的实体表示。可能还讨论了评估方法，包括标准的精确度、召回率和F1分数，以及如何处理多义词和同形异义词等复杂情况。这项工作是深度学习和知识图谱领域的创新尝试，通过结合神经网络的注意力机制和知识图谱的丰富信息（尤其是实体别名），来提升实体链接的性能。这一方法对于提高知识图谱的自动构建和维护质量，以及改善基于知识图谱的信息检索和问答系统具有重要意义。

Although simple, our approach is empirically powerful and shows ≈8% improve-

ment over the baseline. We also release the source code and all utilised data for

reproducibility and reusability on Github

. The remainder of the article is struc-

tured as follows: section 2 motivates our work by discussing Wikidata speciﬁc

entity linking challenges. Section 3 discusses related work. This is followed by

the formulation of the problem in section 4. Section 5 describes the approach.

In section 6 we discuss the experimental setup and the results of the evaluation.

We conclude in section 7.

2 Motivating Examples

We motivate our work by highlighting some challenges associated with linking

entities in the text to Wikidata. Wikidata is a community eﬀort to collect and

provide an open structured encyclopedic data. The total number of entities de-

scribed in Wikidata is over 54.1 million [23]. Wikidata entities are represented

by unique IDs known as QID and QIDs are associated with entity labels. Figure

1 shows three sentences extracted from the dataset released by ElSahar et al.

[9] which aligns 6.2 million Wikipedia sentences to associated Wikidata triples

(<subject,predicate,object>).

Fig. 1. Wikidata Entity linking Challenges: Besides the challenge of capitalisation of

surface forms and implicit nature of entities, Wikidata has several speciﬁc challenges,

such as very long entity labels and user created entities.

In the ﬁrst sentence S1, the surface form ASIC is linked to a Wikidata entity

wiki:Q217302 and the entity is implicit (i.e. no exact string match between sur-

face form and entity label). However, ASIC is also known as ‘Application Speciﬁc

Integrated Circuit’ or Custom Chip. Therefore to disambiguate this entity, back-

ground information about the surface form will be useful. Please note, we will use

this sentence as a running example ”Sentence S1”. In the second sentence S2 the

surface form Andhra Pradesh High Court is linked to wiki:Q3276107 which

https://github.com/mulangonando/Arjun

剩余14页未读，继续阅读

Fun_He

粉丝: 19
资源: 104

使用注意力神经网络编码知识图谱实体别名进行Wikidata实体链接

TuckER：Tensor Factorization for Knowledge Graph Completion.pdf

Knowledge Graph Fact Prediction via Knowledge-Enriched Tensor Factorization.pdf

Enhancement of Power Equipment Management Using Knowledge Graph.pdf

KGAT: Knowledge Graph Attention Network for Recommendation

Learning+Entity+and+Relation+Embeddings+for+Knowledge+Graph+Completion

给我推荐20个比较流行的实体链接算法模型

Knowledge Graph Embedding Based Question Answering

Text-Augmented Open Knowledge Graph Completion via Pre-Trained Language Models

基于知识图谱的SDN网络故障参考文献,5篇英文

最新资源