"深度双向转换器BERT：语言理解的预训练模型"

BERT

需积分: 0 7 浏览量更新于2024-01-22 收藏 966KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

BERT（Bidirectional Encoder Representations from Transformers）是一种全新的语言表示模型，与最近的语言表示模型不同，它旨在通过对未标记文本进行深度双向表示的预训练，从而在所有层中同时考虑左右上下文。本论文翻译介绍了BERT的预训练技术和其对语言理解的影响。 BERT的提出是在近年来深度学习方法在自然语言处理领域取得突破性进展的背景下。在此之前，传统的自然语言处理模型缺乏对上下文的全面理解，以及对语言表达的深度挖掘。为了解决这些问题，本文提出了一个全新的深度双向转换器模型，即BERT，它将左右上下文在所有层都进行了联合建模。相比于之前的语言表示模型，BERT的创新之处在于它能够更好地理解语言文本中的词汇，句子与上下文之间的关联，以及语义隐含信息。通过在大规模未标记文本上的预训练，BERT可以更好地捕捉到语言表达中的复杂关系和语义信息，从而在后续的下游任务中取得更好的性能。具体来说，本文介绍了BERT的预训练过程，其中利用了Transformer结构、深层双向模型以及注意力机制。通过在大规模语料库上进行掩码语言模型和下一句预测的任务，BERT可以学习到丰富的语言表示，并能够进行有效的迁移学习。此外，本文还详细介绍了BERT在一系列语言理解任务上的优异表现，如文本分类、句子相似度判断、命名实体识别等。实验结果表明，BERT预训练模型的效果在这些任务上都超过了之前的模型，并且在少量标注数据的情况下，也可以取得非常好的效果。总的来说，本文翻译为广大研究者和从业者提供了一个全新的语言表示模型BERT，并且详细介绍了它的预训练技术以及在语言理解任务上的表现。BERT的提出将为自然语言处理领域带来全新的突破，并且为未来的深度学习模型研究提供了新的思路和方向。BERT的成功将激励更多的研究者投入到语言表示和理解的研究中，以期取得更好的理解理解效果。

资源详情

资源推荐

2.1 Unsupervised Feature-beased Approaches

Learning widely applicable representations of words has been an active area of

research for decades, including non-neural (Brown et al., 1992; Ando and Zhang, 2005;

Blitzer et al., 2006) and neural (Mikolov et al., 2013; Pennington et al., 2014) methods.

Pre-trained word embeddings are an integral part of modern NLP systems, offering

significant improvements over embeddings learned from scratch (Turian et al., 2010). To

pre-train word embedding vectors, left-to-right language modeling objectives have been

used (Mnih and Hinton, 2009), as well as objectives to discriminate correct from

incorrect words in left and right context (Mikolov et al., 2013).

几十年来，学习广泛适用的单词表示一直是一个活跃的研究领域，包括非神经

（Brown et al., 1992; Ando and Zhang, 2005; Blitzer et al., 2006）和神经（Mikolov et

al., 2013） ; Pennington et al., 2014) 方法。预训练的词嵌入是现代 NLP 系统不可或

缺的一部分，与从头开始学习的嵌入相比，提供了显着的改进（Turian 等人，

2010）。为了预训练词嵌入向量，使用了从左到右的语言建模目标（Mnih 和 Hinton，

2009），以及在左右上下文中区分正确单词和不正确单词的目标（Mikolov 等，

2013）。

These approaches have been generalized to coarser granularities, such as sentence

embeddings (Kiros et al., 2015; Logeswaran and Lee, 2018) or paragraph embeddings

(Le and Mikolov, 2014). To train sentence representations, prior work has used objectives

to rank candidate next sentences (Jernite et al., 2017; Logeswaran and Lee, 2018), left-to-

right generation of next sentence words given a representation of the previous sentence

(Kiros et al., 2015), or denoising autoencoder derived objectives (Hill et al., 2016).

这些方法已被推广到更粗粒度的方法，例如句子嵌入（Kiros et al., 2015;

Logeswaran and Lee, 2018）或段落嵌入（Le and Mikolov, 2014）。为了训练句子表

示，先前的工作使用目标来对候选下一个句子进行排名（Jernite 等人，2017；

Logeswaran 和 Lee，2018），给定前一个句子的表示，从左到右生成下一个句子单

词（Kiros 等人） al., 2015)，或降噪自编码器派生目标 (Hill et al., 2016)。

ELMo and its predecessor (Peters et al., 2017, 2018a) generalize traditional word

embedding research along a different dimension. They extract context-sensitive features

from a left-to-right and a right-to-left language model. The contextual representation of

each token is the concatenation of the left-to-right and right-to-left representations. When

integrating contextual word embeddings with existing task-specific architectures, ELMo

advances the state of the art for several major NLP benchmarks (Peters et al., 2018a)

including question answering (Rajpurkar et al., 2016), sentiment analysis (Socher et al.,

2013), and named entity recognition (Tjong Kim Sang and De Meulder, 2003). Melamud

et al. (2016) proposed learning contextual representations through a task to predict a

single word from both left and right context using LSTMs. Similar to ELMo, their model

is feature-based and not deeply bidirectional. Fedus et al. (2018) shows that the Cloze

task can be used to improve the robustness of text generation models.

ELMo 及其前身 (Peters et al., 2017, 2018a) 将传统的词嵌入研究沿不同的维度

进行了推广。他们从从左到右和从右到左的语言模型中提取上下文相关的特征。每

剩余19页未读，继续阅读

ProgrammerMonkey

粉丝: 43
资源: 37

会员权益专享

"深度双向转换器BERT：语言理解的预训练模型"

BERT_Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT Pre-training of Deep Bidirectional Transformers for Language Understanding

bert预训练模型（英文）

BERT Pre-training of Deep Bidirectional Transformers for Language Understanding.

BERT：预训练的深度双向 Transformer 语言模型

用matlab语言写一个bert模型

推荐一些学习chatgpt的论文

推荐10篇学习chatgpt的论文

https://arxiv.org/abs/1812.02356

https://arxiv.org/abs/1902.04864

transformer必看

计算机毕设英文参考文献

把以上介绍的四篇文献详细介绍下

sentence-transformers的帮助文档

transformer文献推荐

能给我20篇关于这个方面的参考文献吗

深度学习近两年的文献

chatgpt权威文献

必须说明与现有技术相比、该作品是否具有突出的实质性技术特点和显著进步。请提供技术性分析说明和参考文献资料

推荐30个以上比较好的命名实体识别模型

会员权益专享

最新资源