Siamese LSTM模型提升句子相似度：高效性价比与深度结构

需积分: 31 15 浏览量更新于2024-09-06 收藏 1.13MB PDF 举报

本文档《Siamese Recurrent Architectures for Learning Sentence Similarity》探讨了在自然语言处理(NLP)领域中，如何利用Siamese网络架构，特别是长短期记忆(LSTM)模型来提升句子相似度分析的性能。作者Jonas Mueller和Aditya Thyagarajan来自麻省理工学院计算机科学与人工智能实验室和M.S.Ramaiah Institute of Technology的计算机科学与工程系，他们关注的是如何通过最简洁的模型设计和特征工程，实现高效的语义理解。 Siamese LSTM模型的核心思想是构建一对输入序列（通常为两个句子）的孪生网络结构，每个网络共享相同的参数。这种架构有助于捕捉到句子间的潜在语义相似性，即使它们在词汇或语法上有差异。通过将词嵌入技术与同义词信息相结合，模型能够学习到一个固定大小的向量来表示句子的基本意义，而这个向量不受具体词汇选择或句法结构的影响。这种方法强化了模型的泛化能力，使得它能够适应各种表达方式。文章的重点在于，尽管模型的复杂性相较于一些精心设计的传统特征和近期更复杂的神经网络系统较低，但其在实际的句子相似度评估任务中却表现出超越现有技术水平的结果。这主要归功于其对句子表示的学习策略，即通过曼哈顿距离等简单操作，促使学到的句子表示形成一个高度结构化的空间，其中的几何关系反映了复杂的语义联系。这篇论文提供了一种经济高效的方法，展示了如何利用基础的Siamese LSTM架构在保持模型简洁的同时，提升语义相似度分析的准确性和效率。对于那些寻求性价比高的NLP解决方案的开发者和研究者来说，这篇文章提供了一个有价值的参考，表明即使是相对简单的模型和特征工程，也能在特定任务上取得卓越的性能。

Siamese Recurrent Architectures for Learning Sentence Similarity

Jonas Mueller

Computer Science

& Artiﬁcial Intelligence Laboratory

Massachusetts Institute of Technology

Aditya Thyagarajan

Department of Computer Science

and Engineering

M. S. Ramaiah Institute of Technology

Abstract

We present a siamese adaptation of the Long Short-Term

Memory (LSTM) network for labeled data comprised of pairs

of variable-length sequences. Our model is applied to as-

sess semantic similarity between sentences, where we ex-

ceed state of the art, outperforming carefully handcrafted

features and recently proposed neural network systems of

greater complexity. For these applications, we provide word-

embedding vectors supplemented with synonymic informa-

tion to the LSTMs, which use a ﬁxed size vector to encode

the underlying meaning expressed in a sentence (irrespective

of the particular wording/syntax). By restricting subsequent

operations to rely on a simple Manhattan metric, we compel

the sentence representations learned by our model to form

a highly structured space whose geometry reﬂects complex

semantic relationships. Our results are the latest in a line of

ﬁndings that showcase LSTMs as powerful language models

capable of tasks requiring intricate understanding.

Introduction

Text understanding and information retrieval are important

tasks which may be greatly enhanced by modeling the un-

derlying semantic similarity between sentences/phrases. In

particular, a good model should not be susceptible to vari-

ations of wording/syntax used to express the same idea.

Learning such a semantic textual similarity metric has thus

generated a great deal of research interest (Marelli et al.

2014). However, this remains a hard problem, because la-

beled data is scarce, sentences have both variable length and

complex structure, and bag-of-words/tf-IDF models, while

dominant in natural language processing (NLP), are limited

in this context by their inherent term-speciﬁcity (c.f. Mihal-

cea, Corley, and Strapparava 2006).

As an alternative to these ideas, Mikolov et al. (2013)

and others have demonstrated the effectiveness of neural

word representations for analogies and other NLP tasks.

Recently, interests have shifted toward extensions of these

ideas beyond the individual word-level to larger bodies of

text such as sentences, where a mapping is learned to repre-

sent each sentence as a ﬁxed-length vector (Kiros et al. 2015;

Tai, Socher, and Manning 2015; Le and Mikolov 2014).

 2016, Association for the Advancement of Artiﬁcial

Naturally suited for variable-length inputs like sentences,

recurrent neural networks (RNN), especially the Long

Short-Term Memory model of Hochreiter and Schmidhu-

ber (1997), have been particularly successful in this setting

for tasks such as text classiﬁcation (Graves 2012) and lan-

guage translation (Sutskever, Vinyals, and Le 2014). RNNs

adapt standard feedforward neural networks for sequence

data (x

,...,x

), where at each t ∈{1,...,T}, updates

to a hidden-state vector h

are performed via

= sigmoid (Wx

+ Uh

t−1

) (1)

While Siegelmann and Sontag (1995) have shown that the

basic RNN is Turing-complete, optimization of the weight-

matrices is difﬁcult because its backpropagated gradients

become vanishingly small over long sequences. Practically,

the LSTM is superior to basic RNNs for learning long

range dependencies through its use of memory cell units that

can store/access information across lengthy input sequences.

Like RNNs, the LSTM sequentially updates a hidden-state

representation, but these steps also rely on a memory cell

containing four components (which are real-valued vectors):

a memory state c

, an output gate o

that determines how

the memory state affects other units, as well as an input

(and forget) gate i

(and f

) that controls what gets stored

in (and omitted from) memory based on each new input

and the current state. Below are the updates performed at

each t ∈{1,...,T} in an LSTM parameterized by weight

matrices W

and bias-vectors

= sigmoid (W

+ U

t−1

+ b

) (2)

= sigmoid (W

+ U

t−1

+ b

) (3)

c

= tanh (W

+ U

t−1

+ b

) (4)

= i

 c

+ f

 c

t−1

(5)

= sigmoid (W

+ U

t−1

+ b

) (6)

= o

 tanh(c

) (7)

A more thorough exposition of the LSTM model and its vari-

ants is provided by Graves (2012) and Greff et al. (2015).

Although the success of LSTM language models eludes

current theoretical understanding, Sutskever, Vinyals, and

Le (2014) empirically validate the intuition that an effec-

tively trained network maps each sentence onto a ﬁxed-

length vector which encodes the underlying meaning ex-

pressed in the text. Recent works have proposed many

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16)

2786

下载后可阅读完整内容，剩余6页未读，立即下载

wolegequya

粉丝: 0
资源: 7

Siamese LSTM模型提升句子相似度：高效性价比与深度结构

LSTM

deep-siamese-text-similarity:基于siamese-lstm的中文句子相似度计算

MALSTM-FCN_TensorflowKeras2:MALSTM-FCN基本实现

PyPI 官网下载 | siamese-0.0.28.tar.gz

藏经阁-Supervised similarity_Learning.pdf

YXM_基于Siamese神经网络的人脸的门禁.zip

Graph Matching Networks for Learning the Similarity of Graph Structured Objects

追踪_siamese.pdf

Deep Learning for Visual Tracking A Comprehensive Survey.pdf

CaptionNet Automatic End-to-End Siamese Difference Captioning Model.pdf

最新资源