GPT预训练与自然语言理解任务提升

需积分: 0 70 浏览量更新于2024-08-04 收藏 609KB PDF 举报

本文主要探讨了人工智能（AI）领域内的自然语言理解（NLU）技术，特别是通过生成式预训练（Generative Pre-Training，GPT）方法在提升语言理解能力方面的突破。GPT作为最初版本的研究成果，其核心思想在于利用大量的未标注文本数据进行模型的前期训练，然后通过任务导向的微调（task-aware fine-tuning）来针对特定任务进行优化，从而解决标签数据稀缺的问题。首先，自然语言理解涉及多种复杂任务，如文本蕴含、问答、语义相似度评估和文档分类，这些任务的性能往往受限于标注数据的缺乏。传统的歧视性训练模型在这种情况下难以达到理想的表现。文章提出了一种创新的方法，即通过大规模的生成式预训练，构建一个通用的语言模型，这个模型能够理解和生成丰富的语言结构和模式。在生成式预训练阶段，研究人员使用的是大量的未标注文本数据，通过无监督的学习过程，模型逐渐建立起对语言的深入理解。这种方法的优势在于能够捕捉到语言的全局上下文和潜在规律，为后续的微调提供了坚实的基础。接下来，文章的关键贡献在于提出了任务意识输入变换（Task-Aware Input Transformations），这是一种在微调阶段的应用策略，它允许模型在保持基本架构不变的情况下，根据特定任务的需求调整输入的方式，从而实现有效的知识迁移，而无需大幅度修改模型结构。这种灵活性使得模型能够在适应各种NLU任务时，保持较高的泛化能力。作者通过广泛的基准测试验证了他们的方法在多个自然语言理解任务上的有效性，证明了生成式预训练和任务导向微调策略的有效性和实用性。这项研究对于推进人工智能领域的自然语言处理技术，尤其是在面对标注数据匮乏的情况时，具有重要的理论和实践价值。此外，GPT的成功也为后续的深度学习模型设计和迁移学习策略提供了新的启示，推动了AI技术在语言处理领域的进一步发展。

pre-trained language or machine translation model as auxiliary features while training a supervised

model on the target task. This involves a substantial amount of new parameters for each separate

target task, whereas we require minimal changes to our model architecture during transfer.

Auxiliary training objectives

Adding auxiliary unsupervised training objectives is an alternative

form of semi-supervised learning. Early work by Collobert and Weston [

] used a wide variety of

auxiliary NLP tasks such as POS tagging, chunking, named entity recognition, and language modeling

to improve semantic role labeling. More recently, Rei [

] added an auxiliary language modeling

objective to their target task objective and demonstrated performance gains on sequence labeling

tasks. Our experiments also use an auxiliary objective, but as we show, unsupervised pre-training

already learns several linguistic aspects relevant to target tasks.

3 Framework

Our training procedure consists of two stages. The ﬁrst stage is learning a high-capacity language

model on a large corpus of text. This is followed by a ﬁne-tuning stage, where we adapt the model to

a discriminative task with labeled data.

3.1 Unsupervised pre-training

Given an unsupervised corpus of tokens

U = {u

, . . . , u

}

, we use a standard language modeling

objective to maximize the following likelihood:

(U) =

log P (u

i−k

, . . . , u

i−1

; Θ) (1)

where

is the size of the context window, and the conditional probability

is modeled using a neural

network with parameters Θ. These parameters are trained using stochastic gradient descent [51].

In our experiments, we use a multi-layer Transformer decoder [

] for the language model, which is

a variant of the transformer [

]. This model applies a multi-headed self-attention operation over the

input context tokens followed by position-wise feedforward layers to produce an output distribution

over target tokens:

= UW

+ W

= transformer_block(h

l−1

)∀i ∈ [1, n]

P (u) = softmax(h

)

(2)

where

U = (u

−k

, . . . , u

−1

)

is the context vector of tokens,

is the number of layers,

is the token

embedding matrix, and W

is the position embedding matrix.

3.2 Supervised ﬁne-tuning

After training the model with the objective in Eq. 1, we adapt the parameters to the supervised target

task. We assume a labeled dataset

, where each instance consists of a sequence of input tokens,

, . . . , x

, along with a label

. The inputs are passed through our pre-trained model to obtain

the ﬁnal transformer block’s activation

, which is then fed into an added linear output layer with

parameters W

to predict y:

P (y|x

, . . . , x

) = softmax(h

). (3)

This gives us the following objective to maximize:

(x,y )

log P (y|x

, . . . , x

). (4)

We additionally found that including language modeling as an auxiliary objective to the ﬁne-tuning

helped learning by (a) improving generalization of the supervised model, and (b) accelerating

convergence. This is in line with prior work [

], who also observed improved performance with

such an auxiliary objective. Speciﬁcally, we optimize the following objective (with weight λ):

Overall, the only extra parameters we require during ﬁne-tuning are

, and embeddings for delimiter

tokens (described below in Section 3.3).

剩余11页未读，继续阅读

CR1820

粉丝: 9
资源: 2

GPT预训练与自然语言理解任务提升

gpt论文-HuggingGPT Solving AI Tasks with ChatGPT and

GPT-3 论文语言模型是 FEW SHOT LEARNERS

自然语言处理论文（2021.08.05).rar

翻译一下GPT-3的论文

有哪些gpt外文文献阅读网站或软件

gpt sovits模型怎么接入Quicker

介绍一下OpenAI

以《Chat GPT的利与弊》为题，写一篇议论文，不少于1000字

人工智能最新进展文献

新出的什么大模型，如人工智能大模型类似的

最新资源