A Decomposable Attention Model for Natural Language Inference
Ankur P. Parikh
Google
New York, NY
Oscar T
¨
ackstr
¨
om
Google
New York, NY
Dipanjan Das
Google
New York, NY
Jakob Uszkoreit
Google
Mountain View, CA
{aparikh,oscart,dipanjand,uszkoreit}@google.com
Abstract
We propose a simple neural architecture for nat-
ural language inference. Our approach uses at-
tention to decompose the problem into subprob-
lems that can be solved separately, thus making
it trivially parallelizable. On the Stanford Natu-
ral Language Inference (SNLI) dataset, we ob-
tain state-of-the-art results with almost an order
of magnitude fewer parameters than previous
work and without relying on any word-order in-
formation. Adding intra-sentence attention that
takes a minimum amount of order into account
yields further improvements.
1 Introduction
Natural language inference (NLI) refers to the prob-
lem of determining entailment and contradiction re-
lationships between a premise and a hypothesis. NLI
is a central problem in language understanding (Katz,
1972; Bos and Markert, 2005; Benthem, 2008; Mac-
Cartney and Manning, 2009) and recently the large
SNLI corpus of 570K sentence pairs was created for
this task (Bowman et al., 2015). We present a new
model for NLI and leverage this corpus for compari-
son with prior work.
A large body of work based on neural networks
for text similarity tasks including NLI have been pub-
lished in recent years (Hu et al., 2014; Rockt
¨
aschel
et al., 2016; Wang and Jiang, 2015; Yin et al., 2016,
inter alia). The dominating trend in these models is
to build complex, deep text representation models,
for example, with convolutional networks (LeCun et
al., 1990, CNNs henceforth) or long short-term mem-
ory networks (Hochreiter and Schmidhuber, 1997,
LSTMs henceforth) with the goal of deeper sen-
tence comprehension. While these approaches have
yielded impressive results, they are often computa-
tionally very expensive, and result in models having
millions of parameters (excluding embeddings).
Here, we take a different approach, arguing that
in many cases natural language inference does not
require deep modeling of sentence structure. Mere
comparison of local text substructure followed by ag-
gregation of this information may work equally well
for making global inferences. For example, consider
the following sentences:
•
Bob is in his room, but because of the thunder
and lightning outside, he cannot sleep.
• Bob is awake.
• It is sunny outside.
The first sentence is complex in structure and it
is challenging to construct a compact representation
that expresses its entire meaning. However, it is fairly
easy to conclude that the second sentence follows
from the first one, by simply aligning Bob with Bob
and cannot sleep with awake and recognizing that
these are synonyms. Similarly, one can conclude
that It is sunny outside contradicts the first sentence,
by aligning thunder and lightning with sunny and
recognizing that these are most likely incompatible.
We leverage this intuition to build a simpler and
more lightweight approach to NLI within a neural
framework that with considerably fewer parameters
outperforms more complex existing neural architec-
tures. In contrast to existing approaches, our ap-
proach only relies on alignment and is fully computa-
tionally decomposable with respect to the input text.
arXiv:1606.01933v1 [cs.CL] 6 Jun 2016