深度学习：Ian Goodfellow, Yoshua Bengio, Aaron Courville合著

5星 · 超过95%的资源需积分: 11 92 浏览量更新于2024-07-20 1 收藏 25.27MB PDF 举报

"《Deep Learning》是由谷歌大脑团队的科学家Ian Goodfellow、Yoshua Bengio和Aaron Courville合著的一本深度学习领域的经典著作，由MIT出版社出版。这本书全面介绍了深度学习的基础理论、技术和应用，是深度学习研究者和从业者的重要参考书。" 在深度学习这个领域，《Deep Learning》提供了丰富的知识内容，涵盖了从基础知识到最新进展的多个方面。书中的内容主要分为几个部分，首先，它以介绍性章节为开端，引导读者理解深度学习的背景和重要性，明确适合阅读本书的读者群体。 1. 引言（Introduction）：这部分旨在让读者对深度学习有一个初步的了解，包括深度学习的历史发展趋势，以及为什么深度学习会在当前的计算环境中变得如此重要。作者强调了深度学习在图像识别、自然语言处理、语音识别等领域的广泛应用。 2. 应用数学与机器学习基础（Applied Math and Machine Learning Basics）：这部分深入浅出地讲解了深度学习所需的基本数学概念，如线性代数。书中详细阐述了： - 数量（Scalars）、向量（Vectors）、矩阵（Matrices）和张量（Tensors）的概念。 - 矩阵与向量的乘法，以及其运算规则。 - 单位矩阵（Identity matrices）和逆矩阵（Inverse matrices）的作用。 - 线性相关性和向量空间（Linear Dependence and Span），这对于理解神经网络的线性变换至关重要。 - 范数（Norms）的概念，它们在优化算法中用于度量向量或矩阵的大小。 - 特殊类型的矩阵和向量，如正交矩阵、对角矩阵等。 - 主成分分析（Eigen-decomposition），这是许多机器学习算法，包括深度学习中的正则化技术的基础。这些基础知识对于理解和实现深度学习模型至关重要，为后续的章节打下了坚实的理论基础。除了这些基础理论，书中还详细讨论了神经网络的构建、训练和优化方法，包括反向传播算法、损失函数、激活函数、优化算法（如梯度下降及其变种）以及正则化策略。此外，书中也涵盖了深度学习在计算机视觉、自然语言处理、强化学习等领域的应用，以及最新的进展，如生成对抗网络（GANs）、卷积神经网络（CNNs）、递归神经网络（RNNs）和自注意力机制等。《Deep Learning》这本书是深度学习领域的权威之作，无论是初学者还是经验丰富的研究者，都能从中受益匪浅。通过深入阅读和理解书中的内容，读者将能够更好地掌握深度学习的核心原理，并能运用这些知识解决实际问题。

Chapter

ChapterChapter

InIn

tro

trotro

duction

ductionduction

Inventors have long dreamed of creating machines that think. This desire dates

back to at least the time of ancient Greece. The mythical ﬁgures Pygmalion,

Daedalus, and Hephaestus may all be interpreted as legendary inventors, and

Galatea, Talos, and Pandora may all be regarded as artiﬁcial life ( ,Ovid and Martin

2004 Sparkes 1996 Tandy 1997; , ; , ).

When programmable computers were ﬁrst conceived, people wondered whether

such machines might become intelligent, over a hundred years before one was

built (Lovelace 1842, ). Today,

artiﬁcial intelligence

(AI) is a thriving ﬁeld with

many practical applications and active research topics. We look to intelligent

software to automate routine labor, understand speech or images, make diagnoses

in medicine and support basic scientiﬁc research.

In the early days of artiﬁcial intelligence, the ﬁeld rapidly tackled and solved

problems that are intellectually diﬃcult for human beings but relatively straight-

forward for computers—problems that can be described by a list of formal, math-

ematical rules. The true challenge to artiﬁcial intelligence proved to be solving

the tasks that are easy for people to perform but hard for people to describe

formally—problems that we solve intuitively, that feel automatic, like recognizing

spoken words or faces in images.

This book is about a solution to these more intuitive problems. This solution is

to allow computers to learn from experience and understand the world in terms of a

hierarchy of concepts, with each concept deﬁned in terms of its relation to simpler

concepts. By gathering knowledge from experience, this approach avoids the need

for human operators to formally specify all of the knowledge that the computer

needs. The hierarchy of concepts allows the computer to learn complicated concepts

by building them out of simpler ones. If we draw a graph showing how these

CHAPTER 1. INTRODUCTION

concepts are built on top of each other, the graph is deep, with many layers. For

this reason, we call this approach to AI .deep learning

Many of the early successes of AI took place in relatively sterile and formal

environments and did not require computers to have much knowledge about

the world. For example, IBM’s Deep Blue chess-playing system defeated world

champion Garry Kasparov in 1997 ( , ). Chess is of course a very simpleHsu 2002

world, containing only sixty-four locations and thirty-two pieces that can move

in only rigidly circumscribed ways. Devising a successful chess strategy is a

tremendous accomplishment, but the challenge is not due to the diﬃculty of

describing the set of chess pieces and allowable moves to the computer. Chess

can be completely described by a very brief list of completely formal rules, easily

provided ahead of time by the programmer.

Ironically, abstract and formal tasks that are among the most diﬃcult mental

undertakings for a human being are among the easiest for a computer. Computers

have long been able to defeat even the best human chess player, but are only

recently matching some of the abilities of average human beings to recognize objects

or speech. A person’s everyday life requires an immense amount of knowledge

about the world. Much of this knowledge is subjective and intuitive, and therefore

diﬃcult to articulate in a formal way. Computers need to capture this same

knowledge in order to behave in an intelligent way. One of the key challenges in

artiﬁcial intelligence is how to get this informal knowledge into a computer.

Several artiﬁcial intelligence projects have sought to hard-code knowledge about

the world in formal languages. A computer can reason about statements in these

formal languages automatically using logical inference rules. This is known as the

knowledge base

approach to artiﬁcial intelligence. None of these projects has led

to a major success. One of the most famous such projects is Cyc ( ,Lenat and Guha

1989). Cyc is an inference engine and a database of statements in a language

called CycL. These statements are entered by a staﬀ of human supervisors. It is an

unwieldy process. People struggle to devise formal rules with enough complexity

to accurately describe the world. For example, Cyc failed to understand a story

about a person named Fred shaving in the morning ( , ). Its inferenceLinde 1992

engine detected an inconsistency in the story: it knew that people do not have

electrical parts, but because Fred was holding an electric razor, it believed the

entity “FredWhileShaving” contained electrical parts. It therefore asked whether

Fred was still a person while he was shaving.

The diﬃculties faced by systems relying on hard-coded knowledge suggest

that AI systems need the ability to acquire their own knowledge, by extracting

patterns from raw data. This capability is known as

machine learning

. The

CHAPTER 1. INTRODUCTION

introduction of machine learning allowed computers to tackle problems involving

knowledge of the real world and make decisions that appear subjective. A simple

machine learning algorithm called

logistic regression

can determine whether to

recommend cesarean delivery (Mor-Yosef 1990et al., ). A simple machine learning

algorithm called naive Bayes can separate legitimate e-mail from spam e-mail.

The performance of these simple machine learning algorithms depends heavily

on the

representation

of the data they are given. For example, when logistic

regression is used to recommend cesarean delivery, the AI system does not examine

the patient directly. Instead, the doctor tells the system several pieces of relevant

information, such as the presence or absence of a uterine scar. Each piece of

information included in the representation of the patient is known as a

feature

Logistic regression learns how each of these features of the patient correlates with

various outcomes. However, it cannot inﬂuence the way that the features are

deﬁned in any way. If logistic regression was given an MRI scan of the patient,

rather than the doctor’s formalized report, it would not be able to make useful

predictions. Individual pixels in an MRI scan have negligible correlation with any

complications that might occur during delivery.

This dependence on representations is a general phenomenon that appears

throughout computer science and even daily life. In computer science, opera-

tions such as searching a collection of data can proceed exponentially faster if

the collection is structured and indexed intelligently. People can easily perform

arithmetic on Arabic numerals, but ﬁnd arithmetic on Roman numerals much

more time-consuming. It is not surprising that the choice of representation has an

enormous eﬀect on the performance of machine learning algorithms. For a simple

visual example, see ﬁgure .1.1

Many artiﬁcial intelligence tasks can be solved by designing the right set of

features to extract for that task, then providing these features to a simple machine

learning algorithm. For example, a useful feature for speaker identiﬁcation from

sound is an estimate of the size of speaker’s vocal tract. It therefore gives a strong

clue as to whether the speaker is a man, woman, or child.

However, for many tasks, it is diﬃcult to know what features should be extracted.

For example, suppose that we would like to write a program to detect cars in

photographs. We know that cars have wheels, so we might like to use the presence

of a wheel as a feature. Unfortunately, it is diﬃcult to describe exactly what a

wheel looks like in terms of pixel values. A wheel has a simple geometric shape but

its image may be complicated by shadows falling on the wheel, the sun glaring oﬀ

the metal parts of the wheel, the fender of the car or an object in the foreground

obscuring part of the wheel, and so on.

CHAPTER 1. INTRODUCTION













Figure 1.1: Example of diﬀerent representations: suppose we want to separate two

categories of data by drawing a line between them in a scatterplot. In the plot on the left,

we represent some data using Cartesian coordinates, and the task is impossible. In the plot

on the right, we represent the data with polar coordinates and the task becomes simple to

solve with a vertical line. Figure produced in collaboration with David Warde-Farley.

One solution to this problem is to use machine learning to discover not only

the mapping from representation to output but also the representation itself.

This approach is known as

representation learning

. Learned representations

often result in much better performance than can be obtained with hand-designed

representations. They also allow AI systems to rapidly adapt to new tasks, with

minimal human intervention. A representation learning algorithm can discover a

good set of features for a simple task in minutes, or a complex task in hours to

months. Manually designing features for a complex task requires a great deal of

human time and eﬀort; it can take decades for an entire community of researchers.

The quintessential example of a representation learning algorithm is the

au-

toencoder

. An autoencoder is the combination of an

encoder

function that

converts the input data into a diﬀerent representation, and a

decoder

function

that converts the new representation back into the original format. Autoencoders

are trained to preserve as much information as possible when an input is run

through the encoder and then the decoder, but are also trained to make the new

representation have various nice properties. Diﬀerent kinds of autoencoders aim to

achieve diﬀerent kinds of properties.

When designing features or algorithms for learning features, our goal is usually

to separate the

factors of variation

that explain the observed data. In this

context, we use the word “factors” simply to refer to separate sources of inﬂuence;

the factors are usually not combined by multiplication. Such factors are often not

CHAPTER 1. INTRODUCTION

quantities that are directly observed. Instead, they may exist either as unobserved

objects or unobserved forces in the physical world that aﬀect observable quantities.

They may also exist as constructs in the human mind that provide useful simplifying

explanations or inferred causes of the observed data. They can be thought of as

concepts or abstractions that help us make sense of the rich variability in the data.

When analyzing a speech recording, the factors of variation include the speaker’s

age, their sex, their accent and the words that they are speaking. When analyzing

an image of a car, the factors of variation include the position of the car, its color,

and the angle and brightness of the sun.

A major source of diﬃculty in many real-world artiﬁcial intelligence applications

is that many of the factors of variation inﬂuence every single piece of data we are

able to observe. The individual pixels in an image of a red car might be very close

to black at night. The shape of the car’s silhouette depends on the viewing angle.

Most applications require us to the factors of variation and discard thedisentangle

ones that we do not care about.

Of course, it can be very diﬃcult to extract such high-level, abstract features

from raw data. Many of these factors of variation, such as a speaker’s accent,

can be identiﬁed only using sophisticated, nearly human-level understanding of

the data. When it is nearly as diﬃcult to obtain a representation as to solve the

original problem, representation learning does not, at ﬁrst glance, seem to help us.

Deep learning

solves this central problem in representation learning by intro-

ducing representations that are expressed in terms of other, simpler representations.

Deep learning allows the computer to build complex concepts out of simpler con-

cepts. Figure shows how a deep learning system can represent the concept of1.2

an image of a person by combining simpler concepts, such as corners and contours,

which are in turn deﬁned in terms of edges.

The quintessential example of a deep learning model is the feedforward deep

network or

multilayer perceptron

(MLP). A multilayer perceptron is just a

mathematical function mapping some set of input values to output values. The

function is formed by composing many simpler functions. We can think of each

application of a diﬀerent mathematical function as providing a new representation

of the input.

The idea of learning the right representation for the data provides one perspec-

tive on deep learning. Another perspective on deep learning is that depth allows the

computer to learn a multi-step computer program. Each layer of the representation

can be thought of as the state of the computer’s memory after executing another

set of instructions in parallel. Networks with greater depth can execute more

instructions in sequence. Sequential instructions oﬀer great power because later

剩余799页未读，继续阅读

FreeA7

粉丝: 1
资源: 4

深度学习：Ian Goodfellow, Yoshua Bengio, Aaron Courville合著

Deep Learning.pdf Ian Goodfellow Yoshua Bengio Aaron Courville

Deep Learning ( Ian Goodfellow Yoshua Bengio Aaron Courville)

Deep Learning (Ian Goodfellow, Yoshua Bengio and Aaron Courville)

Deep Learning Learning Ian Goodfellow Yoshua Bengio Aaron Courville

Deep Learning Ian Goodfellow Yoshua Bengio Aaron Courville.zip

deep learning book(Ian Goodfellow Yoshua Bengio Aaron Courville)

《Deep Learning》 Ian Goodfellow Yoshua Bengio Aaron Courville.zip

Deep Learning.Ian Goodfellow and Yoshua Bengio and Aaron Courville

Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville

Bengio写的MIT Press《Deep learning》PDF整理版

最新资源