Transformer-XH：多证据推理模型基于xHop注意力机制

需积分: 0 27 浏览量更新于2024-07-01 收藏 1.24MB PDF 举报

"阅读理解-Transformer-XH多证据推理模型" Transformer-XH是一种基于Transformer架构的多证据推理模型，旨在解决多文本之间的关联推理问题。该模型通过引入XtraHop Attention机制，实现了结构化文本的 intrinsic modeling，能够自然地“hop”跨越连接的文本序列，并在每个序列内部attend tokens。这样，Transformer-XH能够更好地进行联合多证据推理，通过在文档之间传播信息和构建全局上下文表示。知识点1：Transformer-XH模型架构 Transformer-XH模型架构基于Transformer架构，引入了XtraHop Attention机制，以实现结构化文本的intrinsic modeling。该模型能够自然地“hop”跨越连接的文本序列，并在每个序列内部attend tokens。知识点2：XtraHop Attention机制 XtraHop Attention机制是Transformer-XH模型的核心组件，该机制能够自然地“hop”跨越连接的文本序列，并在每个序列内部attend tokens。这样，模型能够更好地捕捉跨文本之间的关联关系。知识点3：多证据推理 Transformer-XH模型旨在解决多文本之间的关联推理问题，通过在文档之间传播信息和构建全局上下文表示，实现联合多证据推理。知识点4：结构化文本处理 Transformer-XH模型能够 intrinsic modeling 结构化文本，捕捉文本之间的关联关系和结构信息。知识点5：应用场景 Transformer-XH模型可以应用于多种场景，例如阅读理解、问答系统、文本分类等，能够提高模型的推理能力和准确性。知识点6：模型优点 Transformer-XH模型具有以下优点：能够捕捉跨文本之间的关联关系，实现联合多证据推理，具有更好的推理能力和准确性。知识点7：模型缺点 Transformer-XH模型也存在一些缺点，例如计算复杂度高、需要大量的计算资源和数据等。知识点8： future work Transformer-XH模型为进一步研究和改进的方向，例如如何提高模型的计算效率、如何应用于更多的场景等。

Published as a conference paper at ICLR 2020

Node τ calculates the attention weight on its neighbor η using hop query ˆq

τ,0

and key

η,0

. Then it

uses the weights to combine its neighbors’ value ˆv

η,0

and forms a globalized representation

τ,0

The two attention mechanism are combined to form the new representation of layer l:

τ,0

= Linear([h

τ,0

◦

τ,0

]), (8)

τ,i

= h

τ,i

; ∀i 6= 0. (9)

Note that the non-hub tokens (i 6= 0) still have access to the hop attention in the previous layer

through Eqn. (6).

One layer of eXtra Hop attention can be viewed as single-step of information propagation along

edges E. For example, in Figure 1a, the document node d

updates its representation by gathering

information from its neighbor d

using the hop attention d

→ d

. When multiple Transformer-

XH layers are stacked, this information in d

includes both d

’s local contexts from its in-sequence

attention, and cross-sequence information from the hop attention d

→ d

of the l −1 layer. Hence,

an L-layer Transformer-XH can attend over information from up to L hops away.

Together, three main properties equip Transformer-XH to effectively model raw structured text data:

the propagation of information (values) along edges, the importance of that information (hop at-

tention weights), and the balance of in-sequence and cross-sequence information (attention combi-

nation). The representations learned in H can innately express nuances in structured text that are

required for complex reasoning tasks such as multi-hop QA and natural language inference.

3 APPLICATION TO MULTI-HOP QUESTION ANSWERING

This section describes how Transformer-XH applies to multi-hop QA. Given a question q, the task

is to ﬁnd an answer span a in a large open-domain document corpus, e.g. the ﬁrst paragraph of

all Wikipedia pages. By design, the questions are complex and often require information from

multiple documents to answer. For example, in the case shown in Figure 1b, the correct answer

“Cambridge” requires combining the information from both the Wikipedia pages “Facebook” and

“Harvard University”. To apply Transformer-XH in the open domain multi-hop QA task, we ﬁrst

construct an evidence graph and then apply Transformer-XH on the graph to ﬁnd the answer.

Evidence Graph Construction. The ﬁrst step is to ﬁnd the relevant candidate documents D for

the question q and connect them with edges E to form the graph G. Our set D consists of three

sources. The ﬁrst two sources are from canonical information retrieval and entity linking techniques:

: the top 100 documents retrieved by DrQA’s TF-IDF on the question (Chen et al., 2017).

: the Wikipedia documents associated with the entities that appear in the question, annotated by

entity linking systems: TagMe (Ferragina & Scaiella, 2010) and CMNS (Hasibi et al., 2017).

For better retrieval quality, we use a BERT ranker (Nogueira & Cho, 2019) on the set D

∪ D

and keep the top two ranked ones in D

and top one per question entity in D

. Then the third

source D

exp

includes all documents connected to or from any top ranked documents via Wikipedia

hyperlinks (e.g., “Facebook” → “Harvard University”).

The ﬁnal graph comprises all documents from the three sources as nodes X. The edge matrix E is

ﬂexible. We experiment with various edge matrix settings, including directed edges along Wikipedia

links, i.e. e

= 1 if there is a hyperlink from document i to j, bidirectional edges along Wiki links,

and fully-connected graphs, which rely on Transformer-XH to learns the edge importance.

Similar to previous work (Ding et al., 2019), the textual representation for each node in the graph

is the [SEP]-delimited concatenation of the question, anchor text (the text in the hyperlink in parent

nodes pointing to the child node), and the paragraph itself. More details on the evidence graph

construction are in Appendix A.1.

Transformer-XH on Evidence Graph. Transformer-XH takes the input nodes X and edges E,

and produces the global representation of all text sequences:

= Transformer-XH(X, E). (10)

剩余15页未读，继续阅读

番皂泡

粉丝: 26
资源: 320

Transformer-XH：多证据推理模型基于xHop注意力机制

一种基于多任务联合训练的阅读理解模型

Table-Fact-Checking:ICLR2020论文“ TabFact的数据和代码”

Twin-Networks-for-Sequence-Generation:ICLR 2018再现性挑战

iclr-2021-baynne:ICLR 2021深度学习仿真研讨会论文

deep-clustering-kingdra:ICLR 2020论文《使用伪半监督学习的无监督聚类》的正式实施

protein-sequence-embedding-iclr2019:“使用来自结构的信息学习蛋白质序列嵌入”的源代码-ICLR 2019-Source code learning

FOCAL-ICLR:提交给ICLR 2021的FOCAL论文代码

Anytime-Auto-Regressive-Model:ICLR 2021论文代码，“通过有序自动编码对自回归模型进行随时采样”

斯坦福Jure Leskovec图深度生成模型 - graph_gen-iclr-may19-long.pdf.zip

用卷积滤波器matlab代码-gnn-spectral-expressive-power:发表在ICLR2021上的代码“在光谱角度分析图神经

最新资源