BART：降噪序列对序列预训练在自然语言处理中的应用

需积分: 1 54 浏览量更新于2024-08-03 收藏 289KB PDF 举报

"BART模型是自然语言处理领域的一个重要进展，它是一种降噪序列到序列的预训练模型，主要用于自然语言生成、翻译和理解。该模型由Facebook AI的研究团队提出，旨在通过破坏文本并学习重建原始文本来训练Transformer基的神经机器翻译架构。" BART（Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension）模型的核心在于它的预训练策略。首先，模型会应用一种任意的噪声函数对文本进行破坏，如随机打乱句子顺序或使用新颖的填充方案，将文本片段替换为单一的掩码标记。然后，模型通过学习恢复原始文本，从而提升其理解和生成语言的能力。 BART的架构融合了双向编码器（类似于BERT）和左到右解码器（类似于GPT）的特点，这使得它具有广泛的适用性，可以概括许多最近的预训练方法。在预训练过程中，BART不仅考虑了输入序列的信息，还关注了输出序列的生成过程，使其在文本生成任务上表现尤其出色。同时，BART也能有效地应用于理解任务，如问答和文本摘要。在评估不同噪声方法时，研究发现，随机洗牌句子顺序和使用新颖的填充策略能取得最佳性能。这意味着BART模型能够在处理各种数据扰动时保持稳健，从而更好地学习语言的内在结构和模式。当BART微调用于特定任务时，例如文本生成，其效果显著。而在理解任务上，尽管其可能不如专门为此设计的模型，但依然表现出色。这表明BART模型在自然语言处理领域的广泛潜力，可以作为一个强大的预训练基础，适应各种下游任务，提高模型的泛化能力。 BART模型通过降噪序列到序列的预训练方法，提供了一个通用且高效的框架，对于自然语言处理的多个关键任务，如生成、翻译和理解，都展现出强大的性能和适应性。其设计思路和实验结果为后续的预训练模型研究和开发提供了新的启示和方向。

BART: Denoising Sequence-to-Sequence Pre-training for Natural

Language Generation, Translation, and Comprehension

Mike Lewis*, Yinhan Liu*, Naman Goyal*, Marjan Ghazvininejad,

Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer

Facebook AI

{mikelewis,yinhanliu,naman}@fb.com

Abstract

We present BART, a denoising autoencoder

for pretraining sequence-to-sequence models.

BART is trained by (1) corrupting text with an

arbitrary noising function, and (2) learning a

model to reconstruct the original text. It uses

a standard Tranformer-based neural machine

translation architecture which, despite its sim-

plicity, can be seen as generalizing BERT (due

to the bidirectional encoder), GPT (with the

left-to-right decoder), and many other more re-

cent pretraining schemes. We evaluate a num-

ber of noising approaches, ﬁnding the best per-

formance by both randomly shufﬂing the or-

der of the original sentences and using a novel

in-ﬁlling scheme, where spans of text are re-

placed with a single mask token. BART is

particularly effective when ﬁne tuned for text

generation but also works well for compre-

hension tasks. It matches the performance of

RoBERTa with comparable training resources

on GLUE and SQuAD, achieves new state-

of-the-art results on a range of abstractive di-

alogue, question answering, and summariza-

tion tasks, with gains of up to 6 ROUGE.

BART also provides a 1.1 BLEU increase over

a back-translation system for machine transla-

tion, with only target language pretraining. We

also report ablation experiments that replicate

other pretraining schemes within the BART

framework, to better measure which factors

most inﬂuence end-task performance.

1 Introduction

Self-supervised methods have achieved remarkable

success in a wide range of NLP tasks (Mikolov et al.,

2013; Peters et al., 2018; Devlin et al., 2019; Joshi

et al., 2019; Yang et al., 2019; Liu et al., 2019).

The most successful approaches have been variants of

masked language models, which are denoising autoen-

coders that are trained to reconstruct text where a ran-

dom subset of the words has been masked out. Recent

work has shown gains by improving the distribution of

masked tokens (Joshi et al., 2019), the order in which

masked tokens are predicted (Yang et al., 2019), and the

available context for replacing masked tokens (Dong

et al., 2019). However, these methods typically focus

on particular types of end tasks (e.g. span prediction,

generation, etc.), limiting their applicability.

In this paper, we present BART, which pre-trains

a model combining Bidirectional and Auto-Regressive

Transformers. BART is a denoising autoencoder built

with a sequence-to-sequence model that is applicable

to a very wide range of end tasks. Pretraining has

two stages (1) text is corrupted with an arbitrary nois-

ing function, and (2) a sequence-to-sequence model is

learned to reconstruct the original text. BART uses a

standard Tranformer-based neural machine translation

architecture which, despite its simplicity, can be seen as

generalizing BERT (due to the bidirectional encoder),

GPT (with the left-to-right decoder), and many other

more recent pretraining schemes (see Figure 1).

A key advantage of this setup is the noising ﬂexibil-

ity; arbitrary transformations can be applied to the orig-

inal text, including changing its length. We evaluate

a number of noising approaches, ﬁnding the best per-

formance by both randomly shufﬂing the order of the

original sentences and using a novel in-ﬁlling scheme,

where arbitrary length spans of text (including zero

length) are replaced with a single mask token. This ap-

proach generalizes the original word masking and next

sentence prediction objectives in BERT by forcing the

model to reason more about overall sentence length and

make longer range transformations to the input.

BART is particularly effective when ﬁne tuned for

text generation but also works well for comprehen-

sion tasks. It matches the performance of RoBERTa

(Liu et al., 2019) with comparable training resources

on GLUE (Wang et al., 2018) and SQuAD (Rajpurkar

et al., 2016), and achieves new state-of-the-art results

on a range of abstractive dialogue, question answer-

ing, and summarization tasks. For example, it im-

proves performance by 6 ROUGE over previous work

on XSum (Narayan et al., 2018).

BART also opens up new ways of thinking about ﬁne

tuning. We present a new scheme for machine transla-

tion where a BART model is stacked above a few ad-

ditional transformer layers. These layers are trained

to essentially translate the foreign language to noised

arXiv:1910.13461v1 [cs.CL] 29 Oct 2019

下载后可阅读完整内容，剩余9页未读，立即下载

大宝贱

粉丝: 471
资源: 498

BART：降噪序列对序列预训练在自然语言处理中的应用

深度学习课程视频集合：从自编码器到RNN

PyTorch实现realESRGAN：4倍超分模型及示例

多模式情绪识别：自我监督学习的突破

nlp-papers:必须阅读的有关自然语言处理（NLP）的论文

行业-电子政务-用于智能对话语音平台的音频训练和识别方法及电子设备.zip

时序RBM做运动生成

APS360_DLEAudioGen：多伦多大学关于深度学习自动音频生成的APS360项目

MATLAB机器学习高级应用：自然语言处理和计算机视觉，探索AI前沿

MATLAB自然语言处理新篇章：理论到实践的完美过渡

语音识别技术深度解读：自然语言处理与语音数据的完美融合

最新资源