Sentence-BERT：BERT网络的创新改造，提升语义搜索效率

需积分: 1 53 浏览量更新于2024-08-03 收藏 536KB PDF 举报

Sentence-BERT（SBERT）是由Nils Reimers和Iryna Gurevych在Unibversity of Darmstadt的Ubiquitous Knowledge Processing Lab (UKP-TUDA)及计算机科学系提出的开创性工作。该研究发表于2019年，针对BERT（Devlin等人，2018年）和RoBERTa（Liu等人，2019年）在处理诸如语义文本相似性（STS）等句子对回归任务时所取得的卓越性能进行了改进。原始的BERT架构设计虽然强大，但在处理语义相似性搜索和无监督任务，如聚类时存在显著的问题，因为其要求同时处理两个输入句子，这导致了巨大的计算开销。 SBERT的核心创新在于引入了Siamese和Triplet网络结构。Siamese网络是一种双胞胎网络结构，两个网络共享同一权重，用于同时处理两个输入句子，从而产生对应的嵌入表示。这种设计消除了每次比较都需要独立处理两个句子的需求，大大减少了计算成本。Triplet网络则进一步提升了模型的对比能力，通过学习相似度和距离关系，使得模型能更准确地判断出两个句子之间的相对位置。通过这种方式，SBERT能够生成具有语义意义的句子嵌入，这些嵌入可以使用余弦相似性进行高效比较。相比于使用BERT或RoBERTa进行大规模句子对相似性搜索，SBERT将查找10,000个句子中最相似的一对所需的时间从大约65小时（约6500万次推理计算）缩短至约5秒，同时保持了与BERT相当的准确性。这一改进不仅提高了效率，也使得BERT在更多实际应用中，如信息检索、文本分类和情感分析等领域，变得更加实用和便捷。 Sentence-BERT是对预训练BERT模型的优化，通过引入轻量级的网络结构和高效的相似度评估方法，它极大地降低了在处理大量文本数据时的计算负担，使得基于语义的自然语言处理任务变得更加可行。这项工作的成果不仅提升了学术界对深度学习模型在自然语言理解中的理解和实践，也为实际场景中的实时性和可扩展性提供了新的解决方案。

Sentence A Sentence B

BERT BERT

u v

pooling pooling

(u, v, |u-v|)

Softmax classifier

Figure 1: SBERT architecture with classiﬁcation ob-

jective function, e.g., for ﬁne-tuning on SNLI dataset.

The two BERT networks have tied weights (siamese

network structure).

computed candidate embeddings using attention.

This idea works for ﬁnding the highest scoring

sentence in a larger collection. However, poly-

encoders have the drawback that the score function

is not symmetric and the computational overhead

is too large for use-cases like clustering, which

would require O(n

) score computations.

Previous neural sentence embedding methods

started the training from a random initialization.

In this publication, we use the pre-trained BERT

and RoBERTa network and only ﬁne-tune it to

yield useful sentence embeddings. This reduces

signiﬁcantly the needed training time: SBERT can

be tuned in less than 20 minutes, while yielding

better results than comparable sentence embed-

ding methods.

3 Model

SBERT adds a pooling operation to the output

of BERT / RoBERTa to derive a ﬁxed sized sen-

tence embedding. We experiment with three pool-

ing strategies: Using the output of the CLS-token,

computing the mean of all output vectors (MEAN-

strategy), and computing a max-over-time of the

output vectors (MAX-strategy). The default conﬁg-

uration is MEAN.

In order to ﬁne-tune BERT / RoBERTa, we cre-

ate siamese and triplet networks (Schroff et al.,

2015) to update the weights such that the produced

sentence embeddings are semantically meaningful

and can be compared with cosine-similarity.

The network structure depends on the available

Sentence A Sentence B

BERT BERT

u v

pooling pooling

cosine-sim(u, v)

-1 … 1

Figure 2: SBERT architecture at inference, for exam-

ple, to compute similarity scores. This architecture is

also used with the regression objective function.

training data. We experiment with the following

structures and objective functions.

Classiﬁcation Objective Function. We con-

catenate the sentence embeddings u and v with

the element-wise difference |u− v| and multiply it

with the trainable weight W

∈ R

3n×k

o = softmax(W

(u, v, |u − v|))

where n is the dimension of the sentence em-

beddings and k the number of labels. We optimize

cross-entropy loss. This structure is depicted in

Figure 1.

Regression Objective Function. The cosine-

similarity between the two sentence embeddings

u and v is computed (Figure 2). We use mean-

squared-error loss as the objective function.

Triplet Objective Function. Given an anchor

sentence a, a positive sentence p, and a negative

sentence n, triplet loss tunes the network such that

the distance between a and p is smaller than the

distance between a and n. Mathematically, we

minimize the following loss function:

max(||s

− s

|| − ||s

− s

|| + , 0)

with s

the sentence embedding for a/n/p, || · ||

a distance metric and margin . Margin  ensures

that s

is at least  closer to s

than s

. As metric

we use Euclidean distance and we set  = 1 in our

experiments.

3.1 Training Details

We train SBERT on the combination of the SNLI

(Bowman et al., 2015) and the Multi-Genre NLI

剩余10页未读，继续阅读

林戈的IT生涯

粉丝: 1w+
资源: 111

Sentence-BERT：BERT网络的创新改造，提升语义搜索效率

sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

SiameseBERT-Notebook:使用Siamese-BERT编码的快速语义搜索

sentence-bert: sentence embeddings using siamese bert-networks emnlp

BERT-Embedding-Frequently-Asked-Question:使用BERT的基于常见问题的问答系统

Sentence Transformers: 使用BERT / XLNet进行句子嵌入-python

人工智能大作业：关于计算文本相似度的深度神经网络模型与算法研究分析(BERT、SentenceBERT、SimCSE).zip

Python基于BERT的中文文本相似度识别模型源码+项目说明.zip

somethinglikethat:对于 SemEval2015 STS（任务 12）

深度学习文本相似度分析：BERT、SentenceBERT与SimCSE

掌握BERT/XLNet句子嵌入：Python实现与多语言支持

最新资源