NLP分类新方法：通用语言模型微调

需积分: 14 170 浏览量更新于2024-09-09 收藏 956KB PDF 举报

"这篇文档是关于Universal Language Model Fine-tuning(ULMFiT)在文本分类中的应用，由Jeremy Howard和Sebastian Ruder撰写。文章是全英文的，主要讨论了如何利用预训练的语言模型进行任务特定的微调，以提升自然语言处理(NLP)任务的性能。" 在自然语言处理领域，转移学习已经显著改变了计算机视觉，但传统的NLP方法仍需要针对特定任务进行修改并从头开始训练。ULMFiT是一种通用的转移学习方法，适用于NLP的任何任务，并引入了一些关键的微调技术。该方法在六个文本分类任务上显著超越了现有的最佳表现，大多数数据集上的错误率降低了18-24%。更进一步，即使只有100个标注样本，它也能达到使用100倍数据从头训练的性能。文章介绍，与计算机视觉领域的情况类似，NLP模型通常不从零开始训练，而是利用预训练模型进行微调。ULMFiT提出了一种策略，通过预训练的大规模语言模型，如GPT或BERT，针对目标任务进行精细调整，以优化模型在特定任务上的性能。这种方法减少了对大量标注数据的依赖，提高了训练效率。 ULMFiT的关键技术包括： 1. 逐步卷积（Stepwise Unfreezing）：在微调过程中，首先只更新最后一层的权重，然后逐步解锁并训练更多的层，以避免过拟合。 2. 精细调整学习率调度（Fine-tuning Learning Rate Schedule）：使用不同的学习率策略，如分阶段减少学习率，以适应不同阶段的训练需求。 3. 数据增强（Data Augmentation）：通过对文本进行各种变换，如随机删除、替换或插入单词，增加模型的泛化能力。 4. 连续词向量（Continuous Word Vectors）的微调：不仅微调模型的参数，还包括预训练的词向量，提高模型对词汇语义的理解。作者开放源代码和预训练模型，鼓励研究者和开发者使用和扩展这些技术。这使得社区能够更方便地应用和改进这些方法，推动NLP领域的发展。 ULMFiT展示了如何有效地将预训练语言模型应用于文本分类任务，显著提升了模型的性能，尤其是在有限的标注数据情况下。这一工作对理解和实践NLP的转移学习具有重要意义，对于开发更高效、更具泛化的自然语言处理模型具有指导价值。

13/02/2018 ulmﬁt_pretraining.html

1/1

dollarThegold or

Embedding

layer

Layer1

Layer2

Layer3

Softmax

layer

gold

(a) LM pre-training

13/02/2018 ulmﬁt_lm_ﬁne-tuning.html

1/1

sceneThebest ever

Embedding

layer

Layer1

Layer2

Layer3

Softmax

layer

(b) LM ﬁne-tuning

13/02/2018 ulmﬁt_clas_ﬁne-tuning.html

1/1

sceneThebest ever

Embedding

layer

Layer1

Layer2

Layer3

Softmax

layer

Figure 1: ULMFiT consists of three stages: a) The LM is trained on a general-domain corpus to capture

general features of the language in different layers. b) The full LM is ﬁne-tuned on target task data using

discriminative ﬁne-tuning (‘Discr’) and slanted triangular learning rates (STLR) to learn task-speciﬁc

features. c) The classiﬁer is ﬁne-tuned on the target task using gradual unfreezing, ‘Discr’, and STLR to

preserve low-level representations and adapt high-level ones (shaded: unfreezing stages; black: frozen).

task, which we show signiﬁcantly improves per-

formance (see Section 5). Moreover, language

modeling already is a key component of existing

tasks such as MT and dialogue modeling. For-

mally, language modeling induces a hypothesis

space H that should be useful for many other NLP

tasks (Vapnik and Kotz, 1982; Baxter, 2000).

We propose Universal Language Model Fine-

tuning (ULMFiT), which pretrains a language

model (LM) on a large general-domain corpus and

ﬁne-tunes it on the target task using novel tech-

niques. The method is universal in the sense that

it meets these practical criteria: 1) It works across

tasks varying in document size, number, and label

type; 2) it uses a single architecture and training

process; 3) it requires no custom feature engineer-

ing or preprocessing; and 4) it does not require ad-

ditional in-domain documents or labels.

In our experiments, we use the state-of-the-

art language model AWD-LSTM (Merity et al.,

2017a), a regular LSTM (with no attention,

short-cut connections, or other sophisticated ad-

ditions) with various tuned dropout hyperparame-

ters. Analogous to CV, we expect that downstream

performance can be improved by using higher-

performance language models in the future.

ULMFiT consists of the following steps, which

we show in Figure 1: a) General-domain LM

pretraining (§3.1); b) target task LM ﬁne-tuning

(§3.2); and c) target task classiﬁer ﬁne-tuning

(§3.3). We discuss these in the following sections.

3.1 General-domain LM pretraining

An ImageNet-like corpus for language should be

large and capture general properties of language.

We pretrain the language model on Wikitext-103

(Merity et al., 2017b) consisting of 28,595 prepro-

cessed Wikipedia articles and 103 million words.

Pretraining is most beneﬁcial for tasks with small

datasets and enables generalization even with 100

labeled examples. We leave the exploration of

more diverse pretraining corpora to future work,

but expect that they would boost performance.

While this stage is the most expensive, it only

needs to be performed once and improves perfor-

mance and convergence of downstream models.

3.2 Target task LM ﬁne-tuning

No matter how diverse the general-domain data

used for pretraining is, the data of the target task

will likely come from a different distribution. We

thus ﬁne-tune the LM on data of the target task.

Given a pretrained general-domain LM, this stage

converges faster as it only needs to adapt to the id-

iosyncrasies of the target data, and it allows us to

train a robust LM even for small datasets. We pro-

pose discriminative ﬁne-tuning and slanted trian-

gular learning rates for ﬁne-tuning the LM, which

we introduce in the following.

Discriminative ﬁne-tuning As different layers

capture different types of information (Yosinski

et al., 2014), they should be ﬁne-tuned to differ-

ent extents. To this end, we propose a novel ﬁne-

剩余11页未读，继续阅读

menglichen55

粉丝: 16
资源: 16

NLP分类新方法：通用语言模型微调

wx494社区门诊管理系统小程序-php+vue+uniapp.zip（可运行源码+sql文件+文档）

HTML+CSS+JS+JQ+Bootstrap的家具风格趋势展示响应式网页.7z

高分项目，基于Python+OpenCV的实时疲劳驾驶检测系统，内含源码+演示视频+部署教程

How to Fine-Tune BERT for Text Classification?

Fine-tuning是什么意思

fine-tuning是什么意思

vits-fast-fine-tuning

为什么模型使用了 Transformer 结构，因此可以使用 LoRA 进行 Fine-tuning

openai fine-tuning

推荐30个以上比较好的自然语言处理模型以及github源码？

最新资源