大规模知识图谱补全：通过实例选择的接地网络采样推理

140 浏览量更新于2024-08-29 收藏 675KB PDF 举报

"这篇研究论文探讨了大规模知识库补全问题，提出了一种基于实例选择的接地网络采样推理方法，旨在解决在大型知识库中高效推断新事实的挑战。传统的推理方法对于准确的公式可能具有高精度，但处理大量候选实例时计算量过大，而嵌入式方法虽然能快速计算相似性，但在推理复杂关系时可能精度不足。" 在大规模知识库的构建过程中，知识库补全（Knowledge Base Completion, KBC）是至关重要的技术。随着互联网信息的爆炸性增长，构建涵盖广泛领域的大型知识库已经成为研究热点。然而，如何有效地在这些大规模知识库中推断出未知的事实，是一项极具挑战性的任务。由于候选事实的数量庞大，大多数传统的方法在处理这个问题时表现不佳，它们通常需要逐个实例进行推理，这在处理极端大量的候选集时会导致计算效率低下。论文提出了一种新的策略，即通过实例选择的接地网络采样进行推理。这种方法试图平衡计算效率与推理精度。它利用接地网络（Grounding Network），这是一种能够表示实体和关系之间复杂交互的模型，通过采样策略来选择有代表性的实例进行计算，而不是遍历所有候选实例。这样可以大大减少计算负担，同时保持较高的推理准确性。此外，论文还可能涉及到了深度学习和嵌入表示（embedding representations）的技术。嵌入方法可以将知识库中的实体和关系映射到低维向量空间，通过计算这些向量的相似度来预测新事实。然而，单纯依赖嵌入的模型在处理复杂的、多步推理任务时可能会遇到困难，因为它们可能无法捕捉到所有关系的细微差异。论文中提出的接地网络采样策略可能就是为了解决这一问题，通过采样优化，提高模型对复杂关系推理的能力。这篇研究论文为大规模知识库补全提供了一个新的视角，通过实例选择和接地网络采样的结合，为处理大规模知识库的推理问题提供了更高效的解决方案，这在信息检索、推荐系统、自然语言处理等领域有着广泛的应用前景。

Large-scale Knowledge Base Completion: Inferring via

Grounding Network Sampling over Selected Instances

Zhuoyu Wei

1,2

, Jun Zhao

, Kang Liu

, Zhenyu Qi

, Zhengya Sun

and Guanhua Tian

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences

Interactive Digital Media Technology Research, Institute of Automation, Chinese Academy of Sciences

{zhuoyu.wei, zhenyu.qi, zhengya.sun, guanhua.tian}@ia.ac.cn, {jzhao,

kliu}@nlpr.ia.ac.cn

ABSTRACT

Constructing large-scale knowledge bases has attracted much

attention in recent years, for which Knowledge Base Com-

pletion (KBC) is a key technique. In general, inferring new

facts in a large-scale knowledge base is not a trivial task.

The large number of inferred candidate facts has resulted in

the failure of the majority of previous approaches. Inference

approaches can achieve high precision for formulas that are

accurate, but they are required to infer candidate instances

one by one, and extremely large candidate sets bog them

down in exp ensive calculations. In contrast, the existing

embedding-based methods can easily calculate similarity-

based scores for each candidate instance as opposed to using

inference, so they can handle large-scale data. However, this

type of method does not consider explicit logical semantics

and usually has unsatisfactory precision. To resolve the limi-

tations of the above two types of methods, we propose an ap-

proach through Inferring via Grounding Network Sampling

over Selected Instances. We ﬁrst employ an embedding-

based model to make the instance selection and generate

much smaller candidate sets for subsequent fact inference,

which not only narrows the candidate sets but also ﬁlters

out part of the noise instances. Then, we only make infer-

ences within these candidate sets by running a data-driven

inference algorithm on the Markov Logic Network (MLN),

which is called Inferring via Grounding Network Sampling

(INS). In this process, we especially incorporate the similar-

ity priori generated by embedding-based models into INS to

promote the inference precision. The experimental results

show that our approach improved Hits@1 from 32.911% to

71.692% on the FB15K dataset and achieved much better

AP@n evaluations than state-of-the-art methods.

Categories and Subject Descriptors

E.1 [Data]: DATA STRUCTURES—Graphs and networks

Keywords

Knowledge Base Completion; Embedding; Inference

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full cita-

tion on the ﬁrst page. Copyrights for components of this work owned by others than

ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-

publish, to post on servers or to redistribute to lists, requires prior speciﬁc permission

and/or a fee. Request permissions from Permissions@acm.org.

CIKM’15 October 19-23, 2015, Melbourne, VIC, Australia

 2015 ACM. ISBN 978-1-4503-3794-6/15/10 ...$15.00.

DOI: http://dx.doi.org/10.1145/2806416.2806513.

1. INTRODUCTION

Automatically extracting facts from texts and construct-

ing large-scale knowledge bases (KB) have grown vigorously

in recent years. As a result, several typical knowledge bases

have b een built, such as Freebase [2], Nell [6], YAGO [10],

and Knowledge Vault [7]. However, these extracted reposi-

tories are far from completion. To complete the constructed

KBs, according to the conclusion in [7], using an existing

knowledge base to complete itself is an important supple-

ment for automatic knowledge extraction to increase the

number of facts in KBs and cannot be substituted by other

techniques. Therefore, this paper focuses on the large-scale

knowledge base completion (KBC) and is committed to pre-

dicting the missing links in the existing knowledge base.

In general, according to the process of KBC, there are

two types of approaches: inference-based approaches and

embedding-based approaches.

First, inference-based approaches [22, 16, 24] usually em-

ploy logic formulas to infer the missing links among exist-

ing entities in a KB. They manually or automatically con-

struct various logic formulas and learn the weight of each

formula by sampling or counting groundings from existing

KBs. These weighted formulas are viewed as the long-range

interactions across several relations. The biggest limitation

of such approaches is the computation complexity. These

methods need to infer knowledge one by one, which implies

the computation complexity is linearly growing with the size

of candidate sets. However, usually, there are extremely

large candidate sets for some speciﬁc relations in large-scale

KBC, and in each candidate set, only one or a few are actu-

ally correct. For example, Barack Obama’s mother is miss-

ing in a KB, and we need to ﬁnd out who Barack Obama’s

mother is. All p ersons or females in the KB are candidates,

but only one is the correct selection. The huge candidate set

brings inference-based approaches to an unacceptable run-

ning time. Although some methods have avoided this issue

through simple operations, such as only adding a small part

of false facts to testing sets [13, 23], this strategy is too coarse

to obtain precise inference results. On the other hand, there

are some noise candidates, which may violate formulas and

mislead inference algorithms. Therefore, existing inference

methods that rely on formulas cannot remove the noises by

themselves, and the noises may result in the decrease of the

performance.

In contrast, embedding-based methods [21, 5, 12, 4, 3, 26,

18] are not aﬀected by huge candidate sets because they can

easily calculate similarity-based scores for each candidate

instance after learning representations of entities and rela-

1331

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38557068

粉丝: 4
资源: 862

大规模知识图谱补全：通过实例选择的接地网络采样推理

zsh-better-npm-completion：更好地完成npm

docker-compose-zsh-completion:docker-compose的zsh完成

centos7常用工具包下载

Text-Augmented Open Knowledge Graph Completion via Pre-Trained Language Models

在线安装vim及bash-completion软件包

flux completion bash [flags] 给出使用案例

struct crypto_async_request 这个结构体成员解释一下

Self-supervised Point Cloud Completion on Real Traffic Scenes via Scene-concerned Bottom-up Mechanism讲了什么

Centos ubt代码自动补全

最新资源