基于丰富对齐特征的简单有效文本匹配模型

需积分: 0 53 浏览量更新于2024-08-05 收藏 567KB PDF 举报

"本文介绍了在文本匹配任务中使用更丰富的对齐特征实现简单而有效的神经方法。作者通过探索构建快速且表现良好的文本匹配模型所需的关键要素，提出了一种新方法，该方法强调在序列间对齐过程中保留原始点智能特征、先前对齐特征和上下文特征，并简化其他组件。他们在自然语言推理、同义句识别和答案选择等任务的四个基准数据集上进行了实验，验证了模型的效果。" 文本匹配是自然语言处理领域的一个核心问题，涉及到判断两个文本之间的相似度或关系。传统的文本匹配方法通常依赖于手工构造的特征，如词汇重叠、n-gram匹配等。然而，随着深度学习的发展，尤其是神经网络在自然语言处理中的广泛应用，基于深度学习的文本匹配模型已经成为了研究热点。该论文的作者们提出了一个快速且强大的神经网络模型，用于通用文本匹配应用。他们认为关键在于在对齐过程中充分利用三种特征：原始点智能特征（original point-wise features）——这些是文本序列的基本单元，如单词或字符；先前对齐特征（previous aligned features）——这指的是在之前的对齐步骤中学习到的信息，可以帮助模型理解文本间的对应关系；以及上下文特征（contextual features）——这些特征考虑了文本的语境信息，如词嵌入和上下文表示，能够捕捉到单词的多义性和句子结构。为了简化模型，作者们摒弃了其他可能的复杂组件，这有助于减少计算复杂性，提高模型的训练速度和效率。他们在多个经过充分研究的基准数据集上进行了实验，包括自然语言推理任务（如MNLI和SNLI）、同义句识别（如QQP和MRPC）以及答案选择任务（如SQuAD），实验结果证明了他们的模型在保持高效的同时，能够达到与复杂模型相当甚至更好的性能。这种简单而有效的文本匹配方法对于理解和改进深度学习在文本匹配任务中的作用具有重要意义，它不仅降低了模型的复杂度，还提高了实际应用中的可部署性。同时，这种方法也为未来的研究提供了新的思路，即在简化模型设计的同时，重视关键特征的作用，以实现更好的文本理解和匹配效果。

4701

The details of augmented residual connections and

other layers are introduced as follows.

2.1 Augmented Residual Connections

To provide richer features for alignment processes,

RE2 adopts an augmented version of residual con-

nections to connect consecutive blocks. For a se-

quence of length l, We denote the input and output

of the n-th block as x

(n)

= (x

(n)

, x

(n)

, . . . , x

(n)

)

and o

(n)

= (o

(n)

, o

(n)

, . . . , o

(n)

), respectively. Let

(0)

be a sequence of zero vectors. The input of the

ﬁrst block x

(1)

, as mentioned before, is the output

of the embedding layer (denoted by blank rectan-

gles in Figure 1). The input of the n-th block x

(n)

(n ≥ 2), is the concatenation of the input of the

ﬁrst block x

(1)

and the summation of the output of

previous two blocks (denoted by rectangles with

diagonal stripes in Figure 1):

(n)

= [x

(1)

; o

(n−1)

+ o

(n−2)

], (1)

where [; ] denotes the concatenation operation.

With augmented residual connections, there are

three parts in the input of alignment and fusion

layers, namely original point-wise features kept

untouched along the way (Embedding vectors),

previous aligned features processed and reﬁned by

previous blocks (Residual vectors), and contextual

features from the encoder layer (Encoded vectors).

Each of these three parts plays a complementing

role in the text matching process.

2.2 Alignment Layer

A simple form of alignment based on the attention

mechanism is used following Parikh et al. (2016)

with minor modiﬁcations. The alignment layer, as

shown in Figure 1, takes features from the two se-

quences as input and computes the aligned repre-

sentations as output. Input from the ﬁrst sequence

of length l

is denoted as a = (a

, a

, . . . , a

)

and input from the second sequence of length l

is denoted as b = (b

, b

, . . . , b

). The similarity

score e

between a

and b

is computed as the dot

product of the projected vectors:

= F (a

)

F (b

). (2)

F is an identity function or a single-layer feed-

forward network. The choice is treated as a hyper-

parameter.

The output vectors a

and b

are computed

by weighted summation of representations of the

other sequence. The summation is weighted by

similarity scores between the current position and

the corresponding positions in the other sequence:

j=1

exp(e

)

k=1

exp(e

)

i=1

exp(e

)

k=1

exp(e

)

(3)

2.3 Fusion Layer

The fusion layer compares local and aligned repre-

sentations in three perspectives and then fuse them

together. The output of the fusion layer for the ﬁrst

sequence ¯a is computed by

¯a

= G

([a

; a

]),

¯a

= G

([a

; a

− a

]),

¯a

= G

([a

; a

◦ a

]),

¯a

= G([¯a

; ¯a

]),

(4)

where G

, G

, and G are single-layer feed-

forward networks with independent parameters

and ◦ denotes element-wise multiplication. The

subtraction operator highlights the difference be-

tween the two vectors while the multiplication

highlights similarity. Formulations for

b are simi-

lar and omitted here.

2.4 Prediction Layer

The prediction layer takes the vector representa-

tions of the two sequences v

and v

from the pool-

ing layers as input and predicts the ﬁnal target fol-

lowing Mou et al. (2016):

y = H([v

; v

− v

; v

◦ v

]). (5)

H is a multi-layer feed-forward neural network.

In a classiﬁcation task,

y ∈ R

represents the un-

normalized predicted scores for all classes where

C is the number of classes. The predicted class

is ˆy = argmax

. In a regression task,

y is the

predicted scala value.

In symmetric tasks like paraphrase identiﬁca-

tion, a symmetric version of the prediction layer

is used for better generalization:

y = H([v

; v

; |v

− v

|; v

◦ v

]). (6)

We also provide a simpliﬁed version of the pre-

diction layer. Which version to use is treated as

a hyperparameter. The simpliﬁed prediction layer

can be expressed as:

y = H([v

; v

]). (7)

剩余10页未读，继续阅读

XiZi

粉丝: 615
资源: 325

基于丰富对齐特征的简单有效文本匹配模型

text_matching-master.zip_text matching_文本匹配

深度学习分享之文本匹配2

SGM-Nets: Semi-global matching with neural networks

diff -B --ignore-matching-lines='^\(//\| \*\|\*\|/\*\)' fileA fileB

Image-Text Matching

image-text matching

conjunctive text matching是什么意思

dpkg-query: no packages found matching ros-melodic-dynamic-reconfigure

ERROR: Could not find a version that satisfies the requirement python-docx ERROR: No matching distribution found for python-docx

https:∥github.com/lan-cz/cnn- matching

最新资源

diff -B --ignore-matching-lines='^\(//\| \\|\\|/\*\)' fileA fileB