预训练语言模型：自然语言处理基础

需积分: 5 18 浏览量更新于2024-06-16 收藏 23.89MB PDF 举报

"《自然语言处理基础模型》是一本专著，探讨了人工智能领域的最新进展，特别是针对自然语言处理(NLP)的大模型。该书由Gerhard Paaß和Sven Giesselbach编著，作为'Artificial Intelligence: Foundations, Theory, and Algorithms'系列的一部分，这个系列旨在促进人工智能领域知识、技术和方法的传播，覆盖理论、算法以及广泛的AI应用。大模型在NLP中的作用日益显著，它们通过预先训练的方式学习到大量的文本数据，从而具备理解和生成自然语言的强大能力。这些模型如GPT、BERT等已经成为AI研究的核心组成部分，它们不仅影响了基础理论的发展，还在机器翻译、情感分析、问答系统、文本生成等实际应用中展现出了卓越性能。作者强调，这本书的目标读者群体包括计算机科学、计算机工程、电气工程、数据科学以及相关领域的研究生和研究人员，他们希望通过一本方便的参考资料，了解人工智能的基础理论、方法和关键应用的最新动态。书中详细讨论了如何设计和训练这些基础模型，包括模型架构的选择、优化算法、预训练策略以及如何通过迁移学习和微调来适应特定任务。此外，还会深入探讨模型在融合多媒体信息、跨模态理解等方面的应用，以及这些模型可能带来的挑战，如数据隐私、伦理问题和模型的解释性。通过阅读这本书，读者不仅可以掌握基础模型的基础原理，还能了解到如何将这些模型与实际场景相结合，推动人工智能技术的实际落地。《Foundation Models for Natural Language Processing》为从事或关注自然语言处理领域的专业人士提供了一个全面且深入的指南，帮助他们跟上快速发展的AI前沿。"

xvi Contents

8.2.3 Overreliance or Treating a Foundation Model as Human .... 401

8.2.4 Disclosure of Private Information ............................. 402

8.2.5 Society, Access, and Environmental Harms .................. 403

8.3 Advanced Artiﬁcial Intelligence Systems ............................. 408

8.3.1 Can Foundation Models Generate Innovative Content? ...... 408

8.3.2 Grounding Language in the World ............................ 409

8.3.3 Fast and Slow Thinking ........................................ 412

8.3.4 Planning Strategies ............................................. 413

References ..................................................................... 414

Appendix A ....................................................................... 421

A.1 Sources and Copyright of Images Used in Graphics.................. 421

Index ............................................................................... 427

Chapter 1

Introduction

Abstract With the development of efﬁcient Deep Learning models about a decade

ago, many Deep Neural Networks have been used to solve pattern recognition tasks

such as natural language processing and image recognition. An advantage of these

models is that they automatically create f eatures arranged in layers which represent

the content and do not require manually constructed features. These models rely on

Machine Learning employing statistical techniques to give machines the capability

to ‘learn’ from data without being given explicit instructions on what to do. Deep

Learning models transform the input in layers step by step in such a way that

complex patterns in the data can be recognized. This chapter ﬁrst describes how

a text is pre-processed and partitioned into tokens, which form the basis for natural

language processing. Then we outline a number of classical Machine Learning

models, which are often used as modules in advanced models. Examples include

the logistic classiﬁer model, fully connected layers, recurrent neural networks and

convolutional neural networks.

Keywords Natural language processing · Text preprocessing · Vector space

model · Static embeddings · Recurrent networks · Convolutional networks

1.1 Scope of the Book

With the development of efﬁcient Deep Learning models about a decade ago,

many Deep Neural Networks have been used to solve pattern recognition tasks

such as natural language processing (NLP) and image processing. Typically, the

models have to capture the meaning of a text or an image and make an appropriate

decision. Alternatively they can generate a new text or image according to the task

at hand. An advantage of these models is that they create intermediate features

arranged in layers and do not require manually constructed features. Deep Neural

Networks such as Convolutional Neural Networks (CNNs) [

32] and Recurrent

Neural Networks (RNNs) [

65] use low-dimensional dense vectors as a kind of

distributed representation to express the syntactic and semantic features of language.

G. Paaß, S. Giesselbach, Foundation Models for Natural Language Processing,

Artiﬁcial Intelligence: Foundations, Theory, and Algorithms,

https://doi.org/10.1007/978-3-031-23190-2_1

2 1 Introduction

All these models can be considered as Artiﬁcial Intelligence (AI) Systems. AI

is a broad research ﬁeld aimed at creating i ntelligent machines, acting similar

to humans and animals having natural intelligence. It captures the ﬁeld’s long-

term goal of building machines that mimic and then surpass the full spectrum of

human cognition. Machine Learning (ML) is a subﬁeld of artiﬁcial intelligence

that employs statistical techniques to give machines the capability to ‘learn’ from

data without being given explicit instructions on what to do. This process is also

called ‘training’, whereby a ‘learning algorithm’ gradually improves the model’s

performance on a given task. Deep Learning is an area of ML in which an input

is transformed in layers step by step in such a way that complex patterns in the

data can be recognized. The adjective ‘deep’ refers to the large number of layers in

modern ML models that help to learn expressive representations of data to achieve

better performance.

In contrast to computer vision, the size of annotated training data for NLP

applications was rather small, comprising only a few thousand sentences (except

for machine translation). The main reason for this was the high cost of manual

annotation. To avoid overﬁtting, i.e. overadapting models to random ﬂuctuations,

only relatively small models could be trained, which did not yield high performance.

In the last 5 years, new NLP methods have been developed based on the Transformer

introduced by Vaswani et al. [

67]. They represent the meaning of each word by a

vector of real numbers called embedding. Between these embeddings various kinds

of “attentions” can be computed, which can be considered as a sort of “correlation”

between different words. In higher layers of the network, attention computations are

used to generate new embeddings that can capture subtle nuances in the meaning

of words. In particular, they can grasp different meanings of the same word that

arise from context. A key advantage of these models is that they can be trained

with unannotated text, which is almost inﬁnitely available, and overﬁtting is not a

problem.

Currently, there is a rapid development of new methods in the research ﬁeld,

which makes many approaches from earlier years obsolete. These models are

usually trained in two steps: In a ﬁrst pre-training step, they are trained on a large

text corpus containing billions of words without any annotations. A typical pre-

training task is to predict single words in the text that have been masked in the

input. In this way, the model learns ﬁne subtleties of natural language syntax and

semantics. Because enough data is available, the models can be extended to many

layers with millions or billions of parameters.

In a second ﬁne-tuning step, the model is trained on a small annotated training

set. In this way, the model can be adapted to new speciﬁc tasks. Since the ﬁne-

tuning data is very small compared to the pre-training data and the model has a

high capacity with many millions of parameters, it can be adapted to the ﬁne-

tuning task without losing the stored information about the language structure.

It was demonstrated that this idea can be applied to most NLP tasks, leading to

unprecedented performance gains in semantic understanding. This transfer learning

allows knowledge from the pre-training phase to be transferred to the ﬁne-tuned

model. These models are referred to as Pre-trained Language Models (PLM).

剩余447页未读，继续阅读

中科哥哥

粉丝: 1606

预训练语言模型：自然语言处理基础

Foundations of Statistical Natural Language Processing

Open and Efficient Foundation Language Models.pdf

MATLAB Gaussian Fitting in Machine Learning: Foundation of Constructing Predictive Models, Enhancing...

[Advanced MATLAB Signal Processing]: Multirate Signal Processing Techniques

【Foundation】Detailed Explanation of MATLAB Toolbox: Deep Learning Toolbox

Time Series Autoregressive Models: In-depth Exploration and Practical Techniques

Advanced Techniques for MySQL Data Cleaning and Preprocessing with Python

【Advanced】Advanced Skills for Data Parsing and Extraction

MATLAB Supply Chain Management Optimization: Strategies for Enhancing Efficiency and Case Studies

Challenges and Solutions for Multi-Label Classification Problems: 5 Strategies to Help You Overcome ...

最新资源