ICLR2020新成果：DrKIT，虚拟知识库上的可微推断加速复杂问答

需积分: 11 155 浏览量更新于2024-07-15 收藏 769KB PDF 举报

"DrKIT——虚拟知识库上的可微推断技术是近年来知识图谱研究的热点，该技术在ICLR2020会议上被提出，由CMU和Google的研究人员开发。DrKIT能够将文本数据视为虚拟知识库，用于解决复杂的多跳问题，通过遍历并追踪实体之间的关系路径，实现端到端的训练，性能表现优越且效率高，比基于BERT的方法快10倍。" 在知识图谱领域，DrKIT（Differentiable Reasoning over a Virtual Knowledge Base）是一种创新性的方法，它解决了利用语料库进行复杂多跳问题回答的问题。传统的知识图谱通常依赖于结构化的知识库，而DrKIT则将非结构化的文本数据转化为一个虚拟的知识库，这使得它能够处理那些在传统知识库中难以找到答案的复杂问题。 DrKIT的核心在于其神经模块，这个模块能够像遍历真实知识库一样，逐层解析文本中的实体和关系。它采用稀疏矩阵TF-IDF索引与最大内积搜索（MIPS）相结合的方式，对上下文中的实体表示进行检索和匹配。在每个步骤中，模块通过这两种机制软性地跟随实体间的路径，进行推理。这种模块的设计是可微分的，这意味着整个系统可以从自然语言输入开始，使用基于梯度的优化方法进行端到端的训练。预训练策略也是DrKIT的一个亮点。研究者设计了一种生成困难负样本的方案，利用已有的知识库来增强上下文表示编码器的学习能力。这种预训练方式有助于模型在没有大量标注数据的情况下，更好地理解和处理语料库中的信息。 DrKIT的高效性和准确性主要体现在，相比于基于BERT的模型，它在执行推理任务时速度提高了10倍，同时保持了良好的性能。这在处理大规模文本数据和复杂查询时具有显著优势，尤其适用于需要实时响应和高计算效率的场景。 DrKIT是知识图谱研究中的一种重要进展，它为文本理解、信息检索和复杂问题解答提供了新的思路。结合了深度学习和自然语言处理技术，DrKIT有望在问答系统、智能助手和信息提取等领域发挥重要作用。

Published as a conference paper at ICLR 2020

be evaluated for all latent entity and mention pairs, which is prohibitively expensive. However, by

restricting s

to be an inner product we can implement this efﬁciently (§2.2).

To highlight the differentiability of the proposed overall scheme, we can represent the computation

in Eq. 2 as matrix operations. We pre-compute the TFIDF term for all entities and mentions into

a sparse matrix, which we denote as A

E→M

[e, m] =

(G(e) · F (m) > ). Then entity expansion

to co-occuring mentions can be done using a sparse-matrix by sparse-vector multiplication between

E→M

and z

t−1

. For the relevance scores, let T

(m, z

t−1

, q)) denote the top-K relevant men-

tions encoded as a sparse vector in R

|M|

. Finally, the aggregation of mentions to entities can be

formulated as multiplication with another sparse-matrix B

M→E

, which encodes coreference, i.e.

mentions corresponding to the same entity. Putting all these together, using  to denote element-

wise product, and deﬁning Z

= [Pr(z

= e

|q); . . . ; Pr(z

= e

|E|

|q)], we can observe that for large

K (i.e., as K → |M|), Eq. 2 becomes equivalent to:

= softmax



t−1

E→M

 T

(m, z

t−1

, q))



M→E



. (4)

Note that every operation in above equation is differentiable and between sparse matrices and vec-

tors: we will discuss efﬁcient implementations in §2.2. Further, the number of non-zero entries in Z

is bounded by K, since we ﬁltered (the element-wise product in Eq. 4) to top-K relevant mentions

among TFIDF based expansion and since each mention can only point to a single entity in B

M→E

This is important, as it prevents the number of entries in Z

from exploding across hops (which

might happen if, for instance, we added the relevance and TFIDF scores instead).

We can view Z

t−1

, Z

as weighted multisets of entities, and s

(m, z, q) as implicitly selecting men-

tions which correspond to a relation R. Then Eq. 4 becomes a differentiable implementation of

= Z

t−1

.follow(R), i.e. mimicking the graph traversal in a traditional KB. We thus call Eq. 4 a

textual follow operation.

Training and Inference. The model is trained end-to-end by optimizing the cross-entropy loss

between Z

, the weighted set of entities after T hops, and the ground truth answer set A. We use

a temperature coefﬁcient λ when computing the softmax in Eq, 4 since the inner product scores of

the top-K retrieved mentions are typically high values, which would otherwise result in very peaked

distributions of Z

. We also found that taking a maximum over the mention set of an entity M

Eq. 2 works better than taking a sum. This corresponds to optimizing only over the most conﬁdent

mention of each entity, which works for corpora like Wikipedia that do not have much redundancy.

A similar observation was made by Min et al. (2019) in weakly supervised settings.

2.2 EFFICIENT IMPLEMENTATION

Sparse TFIDF Mention Encoding. To compute the sparse-matrix A

E→M

for entity-mention ex-

pansion in Eq. 4, the TFIDF vectors F(m) and G(z

t−1

) are constructed over unigrams and bigrams,

hashed to a vocabulary of 16M buckets. While F computes the vector from the whole passage

around m, G only uses the surface form of z

t−1

. This corresponds to retrieving all mentions in a

document using z

t−1

as the query. We limit the number of retrieved mentions per entity to a maxi-

mum of µ, which leads to a |E| × |M| sparse-matrix.

|E| = Num entities

millisec

sparse x sparse

sparse x dense

Figure 2: Runtime on a single K80 GPU

when using ragged representations for im-

plementing sparse-matrix vector product, vs

the default sparse-matrix times dense vector

product available in TensorFlow. |E| > 10

leads to OOM for the latter.

Efﬁcient Entity-Mention expansion. The expansion

from a set of entities to mentions occurring around them

can be computed using the sparse-matrix by sparse-vector

product Z

t−1

E→M

. A simple lower bound for multiply-

ing a sparse |E| × |M| matrix, with maximum µ non-

zeros in each row, by a sparse |E| × 1 vector with K

non-zeros is Ω(Kµ). Note that this lower bound is in-

dependent of the size of matrix A

E→M

, or in other words

independent of the number of entities or mentions. To at-

tain the lower bound, the multiplication algorithm must

be vector driven, because any matrix-driven algorithms

need to at least iterate over all the rows. Instead we slice

out the relevant rows from A

E→M

. To enable this our

solution is to represent the sparse-matrix A

E→M

as two

row-wise lists of variable-sized lists of the indices and

剩余15页未读，继续阅读

syp_net

粉丝: 158
资源: 1187

ICLR2020新成果：DrKIT，虚拟知识库上的可微推断加速复杂问答

KBQA-BERT:基于知识图谱的QA系统，BERT模型

Python-自然语言处理闲聊机器人BERT句向量相似度文本分类数据增强

基于BERT模型的中文医学文献分类研究

知识图谱 bert源码

知识图谱知识推理方法分类

《huggingface自然语言处理详解——基于bert中文模型的任务实战》源码

基于AI的知识图谱构建的方法和步骤

推荐30个以上比较好的构建知识图谱github源码

推荐几个知识图谱推理模型

开源知识图谱问答系统

最新资源