解构注意力模型：自然语言推理新方法

需积分: 50 151 浏览量更新于2024-09-09 收藏 260KB PDF 举报

"Decomposable Attention Model是一种用于自然语言推理的神经网络架构，该模型通过注意力机制将问题分解成可独立解决的子问题，从而实现并行化处理。在斯坦福自然语言推理（SNLI）数据集上，该模型达到了最先进的结果，其参数量比前人工作少了一个数量级，并且不依赖于任何词序信息。添加考虑最少词序的句子内注意力可以进一步提升性能。" Decomposable Attention Model是由Google的研究人员Ankur P. Parikh、Oscar T. Ackström、Dipanjan Das和Jakob Uszkoreit提出的，他们利用这种模型来解决自然语言推理（NLI）任务。NLI是理解语言的关键问题，涉及到判断前提和假设之间的蕴含和矛盾关系。该模型的核心是注意力机制，它能够将复杂的问题分解成更简单的部分，使得模型可以在多个子任务上同时工作，提高了计算效率。在传统模型中，自然语言推理通常需要大量的参数和复杂的结构来捕捉语句间的细微关系。然而，Decomposable Attention Model通过将问题分解为两个主要步骤——比较和组合，简化了这一过程。比较步骤关注于识别两个句子中的关键信息，如相同或相反的元素；组合步骤则结合这些信息来形成最终的推理判断。由于这两个步骤可以独立进行，因此模型可以并行处理，降低了计算复杂性。在SNLI数据集上的实验结果显示，Decomposable Attention Model不仅实现了卓越的性能，而且在参数量上显著少于先前的工作。这意味着该模型更高效，更易于训练，同时也减少了过拟合的风险。此外，研究者还发现，通过引入一种考虑最小词序信息的句子内注意力机制，模型的性能得到了进一步提升。这表明，即使在不完全依赖词序的情况下，模型也能有效地捕获句子结构和语义。 Decomposable Attention Model的成功在于其简洁的设计和对并行计算的优化。这种模型为自然语言处理领域提供了新的视角，即如何通过分解和独立处理子任务来提高模型的效率和效果。这一方法可能被应用于其他依赖于复杂语言理解的任务，如问答系统、机器翻译和情感分析等，有望推动整个领域的进步。

A Decomposable Attention Model for Natural Language Inference

Ankur P. Parikh

Google

New York, NY

Oscar T

ackstr

Google

New York, NY

Dipanjan Das

Google

New York, NY

Jakob Uszkoreit

Google

Mountain View, CA

{aparikh,oscart,dipanjand,uszkoreit}@google.com

Abstract

We propose a simple neural architecture for nat-

ural language inference. Our approach uses at-

tention to decompose the problem into subprob-

lems that can be solved separately, thus making

it trivially parallelizable. On the Stanford Natu-

ral Language Inference (SNLI) dataset, we ob-

tain state-of-the-art results with almost an order

of magnitude fewer parameters than previous

work and without relying on any word-order in-

formation. Adding intra-sentence attention that

takes a minimum amount of order into account

yields further improvements.

1 Introduction

Natural language inference (NLI) refers to the prob-

lem of determining entailment and contradiction re-

lationships between a premise and a hypothesis. NLI

is a central problem in language understanding (Katz,

1972; Bos and Markert, 2005; Benthem, 2008; Mac-

Cartney and Manning, 2009) and recently the large

SNLI corpus of 570K sentence pairs was created for

this task (Bowman et al., 2015). We present a new

model for NLI and leverage this corpus for compari-

son with prior work.

A large body of work based on neural networks

for text similarity tasks including NLI have been pub-

lished in recent years (Hu et al., 2014; Rockt

aschel

et al., 2016; Wang and Jiang, 2015; Yin et al., 2016,

inter alia). The dominating trend in these models is

to build complex, deep text representation models,

for example, with convolutional networks (LeCun et

al., 1990, CNNs henceforth) or long short-term mem-

ory networks (Hochreiter and Schmidhuber, 1997,

LSTMs henceforth) with the goal of deeper sen-

tence comprehension. While these approaches have

yielded impressive results, they are often computa-

tionally very expensive, and result in models having

millions of parameters (excluding embeddings).

Here, we take a different approach, arguing that

in many cases natural language inference does not

require deep modeling of sentence structure. Mere

comparison of local text substructure followed by ag-

gregation of this information may work equally well

for making global inferences. For example, consider

the following sentences:

•

Bob is in his room, but because of the thunder

and lightning outside, he cannot sleep.

• Bob is awake.

• It is sunny outside.

The ﬁrst sentence is complex in structure and it

is challenging to construct a compact representation

that expresses its entire meaning. However, it is fairly

easy to conclude that the second sentence follows

from the ﬁrst one, by simply aligning Bob with Bob

and cannot sleep with awake and recognizing that

these are synonyms. Similarly, one can conclude

that It is sunny outside contradicts the ﬁrst sentence,

by aligning thunder and lightning with sunny and

recognizing that these are most likely incompatible.

We leverage this intuition to build a simpler and

more lightweight approach to NLI within a neural

framework that with considerably fewer parameters

outperforms more complex existing neural architec-

tures. In contrast to existing approaches, our ap-

proach only relies on alignment and is fully computa-

tionally decomposable with respect to the input text.

arXiv:1606.01933v1 [cs.CL] 6 Jun 2016

下载后可阅读完整内容，剩余4页未读，立即下载

ccdosccdos

粉丝: 0

解构注意力模型：自然语言推理新方法

attention代码

Attention(注意力机制代码)

各种attention的实现

attention model

Decomposable-Attention-master_deeplearning_注意力机制_

Decomposable-Attention-master_deeplearning_注意力机制_源码.zip

Differentially-private-decomposable-submodular-optimization:该存储库包含用于复制AAAI 2021接受的工作中的专用差分可分解子模块优化中报告的实验结果的代码

layer Perceptrons (MLP): Accelerating Training, Enhancing Efficiency, Shortening Model Development ...

Attention-CNN(Jianlong-Fu 大神制作)

cole_02_0507.pdf

最新资源