深度学习基础与线性代数概览

需积分: 8 151 浏览量更新于2024-06-29 收藏 68.71MB PDF 举报

"深入学习.pdf" 《深入学习》是由Ian Goodfellow、Yoshua Bengio和Aaron Courville三位专家合著的一本深度学习领域的经典教材。本书内容涵盖广泛，旨在帮助读者理解并掌握深度学习的基础理论和技术。 1. 引言书中介绍了深度学习适合的读者群体，包括对机器学习感兴趣的科研人员、工程师和学生。同时，它回顾了深度学习的历史发展趋势，展示了这一领域的演进过程和重要里程碑。 I. 应用数学与机器学习基础这一部分是深入学习的数学基础，对于理解和实现深度学习模型至关重要。 2. 线性代数 - 向量、矩阵和张量是线性代数的基本元素，它们在深度学习中用于表示数据和模型参数。 - 矩阵乘法是计算神经网络的关键运算，它定义了层与层之间的关系。 - 单位矩阵和逆矩阵用于解决线性方程组，而身份矩阵在正则化中也有应用。 - 线性相关性和向量空间（span）的概念解释了数据在高维空间中的结构。 - 范数衡量向量或矩阵的大小，用于优化过程中的梯度下降和权重正则化。 - 特殊矩阵和向量如对角矩阵、正交矩阵在特定神经网络结构中扮演重要角色。 - 黎曼分解和奇异值分解是解决线性问题的两种重要方法，常用于降维和特征提取。 - 伪逆矩阵用于解决非满秩矩阵的问题，例如在反向传播中。 - 迹操作和行列式在理解矩阵性质和优化过程中有用。 3. 概率与信息论 - 概率理论是构建概率模型的基础，为深度学习中的不确定性建模提供理论框架。 - 随机变量是概率论的核心，用于描述不确定事件的结果。 - 概率分布描述随机变量可能出现的概率模式。 - 边缘概率和条件概率是理解概率模型相互关系的关键。 - 条件概率的链式规则是贝叶斯网络和马尔科夫随机场的基础。 - 独立性和条件独立性是设计和分析复杂模型的重要概念。 - 期望、方差和协方差是统计分析的关键工具，用于度量随机变量的中心趋势和变异。本书通过这些基础知识的讲解，逐步引导读者进入深度学习的殿堂。随着内容的深入，读者还将接触到深度学习的其他核心主题，如神经网络、反向传播、优化算法、损失函数、卷积神经网络、循环神经网络、自编码器、生成对抗网络以及强化学习等。这些内容构成了深度学习领域的基石，是每个深度学习从业者必备的知识。

Chapter 1

Introduction

Inventors have long dreamed of creating machines that think. This desire dates

back to at least the time of ancient Greece. The mythical ﬁgures Pygmalion,

Daedalus, and Hephaestus may all be interpreted as legendary inventors, and

Galatea, Talos, and Pandora may all be regarded as artiﬁcial life ( ,Ovid and Martin

2004 Sparkes 1996 Tandy 1997; , ; , ).

When programmable computers were ﬁrst conceived, people wondered whether

they might become intelligent, over a hundred years before one was built (Lovelace,

1842). Today, artiﬁcial intelligence (AI) is a thriving ﬁeld with many practical

applications and active research topics. We look to intelligent software to automate

routine labor,understand speech or images,make diagnoses in medicine and

support basic scientiﬁc research.

In the early days of artiﬁcial intelligence, the ﬁeld rapidly tackled and solved

problems that are intellectually diﬃcult for human beings but relatively straight-

forward for computers—problems that can be described by a list of formal, math-

ematical rules.The true challenge to artiﬁcial intelligence proved to be solving

the tasks that are easy for people to perform but hard for people to describe

formally—problems that we solve intuitively, that feel automatic, like recognizing

spoken words or faces in images.

This book is about a solution to these more intuitive problems. This solution is

to allow computers to learn from experience and understand the world in terms of a

hierarchy of concepts, with each concept deﬁned in terms of its relation to simpler

concepts. By gathering knowledge from experience, this approach avoids the need

for human operators to formally specify all of the knowledge that the computer

needs. The hierarchy of concepts allows the computer to learn complicated concepts

by building them out of simpler ones. If we draw a graph showing how these

CHAPTER 1. INTRODUCTION

concepts are built on top of each other, the graph is deep, with many layers. For

this reason, we call this approach to AI deep learning.

Many of the early successes of AI took place in relatively sterile and formal

environments and did not require computers to have much knowledge about

the world.For example, IBM’s Deep Blue chess-playing system defeated world

champion Garry Kasparov in 1997 ( , ). Chess is of course a very simpleHsu 2002

world, containing only sixty-four locations and thirty-two pieces that can move

in only rigidly circumscribed ways. Devising a successful chess strategy isa

tremendous accomplishment,but the challenge is not due to the diﬃculty of

describing the set of chess pieces and allowable moves to the computer. Chess

can be completely described by a very brief list of completely formal rules, easily

provided ahead of time by the programmer.

Ironically, abstract and formal tasks that are among the most diﬃcult mental

undertakings for a human being are among the easiest for a computer. Computers

have long been able to defeat even the best human chess player, but are only

recently matching some of the abilities of average human beings to recognize objects

or speech. A person’s everyday life requires an immense amount of knowledge

about the world. Much of this knowledge is subjective and intuitive, and therefore

diﬃcult to articulate in a formal way. Computers need to capture this same

knowledge in order to behave in an intelligent way. One of the key challenges in

artiﬁcial intelligence is how to get this informal knowledge into a computer.

Several artiﬁcial intelligence projects have sought to hard-code knowledge about

the world in formal languages. A computer can reason about statements in these

formal languages automatically using logical inference rules. This is known as the

knowledge base approach to artiﬁcial intelligence. None of these projects has led to

a major success. One of the most famous such projects is Cyc ( ,Lenat and Guha

1989). Cyc is an inference engine and a database of statements in a language

called CycL. These statements are entered by a staﬀ of human supervisors. It is an

unwieldy process. People struggle to devise formal rules with enough complexity

to accurately describe the world. For example, Cyc failed to understand a story

about a person named Fred shaving in the morning ( , ). Its inferenceLinde 1992

engine detected an inconsistency in the story:it knew that people do not have

electrical parts, but because Fred was holding an electric razor, it believed the

entity “FredWhileShaving” contained electrical parts. It therefore asked whether

Fred was still a person while he was shaving.

The diﬃculties faced by systems relying on hard-coded knowledge suggest that

AI systems need the ability to acquire their own knowledge, by extracting patterns

from raw data. This capability is known as machine learning. The introduction

CHAPTER 1. INTRODUCTION

of machine learning allowed computers to tackle problems involving knowledge

of the real world and make decisions that appear subjective. A simple machine

learning algorithm called logistic regression can determine whether to recommend

cesarean delivery (Mor-Yosef 1990et al., ). A simple machine learning algorithm

called can separate legitimate e-mail from spam e-mail.naive Bayes

The performance of these simple machine learning algorithms depends heavily

on the representation of the data they are given. For example, when logistic

regression is used to recommend cesarean delivery, the AI system does not examine

the patient directly. Instead, the doctor tells the system several pieces of relevant

information, such as the presence or absence of a uterine scar. Each piece of

information included in the representation of the patient is known as a feature.

Logistic regression learns how each of these features of the patient correlates with

various outcomes. However, it cannot inﬂuence the way that the features are

deﬁned in any way.If logistic regression was given an MRI scan of the patient,

rather than the doctor’s formalized report, it would not be able to make useful

predictions. Individual pixels in an MRI scan have negligible correlation with any

complications that might occur during delivery.

This dependence on representations is a general phenomenon that appears

throughout computer science and even daily life. In computer science, opera-

tions such as searching a collection of data can proceed exponentially faster if

the collection is structured and indexed intelligently.People can easily perform

arithmetic on Arabic numerals, but ﬁnd arithmetic on Roman numerals much

more time-consuming. It is not surprising that the choice of representation has an

enormous eﬀect on the performance of machine learning algorithms. For a simple

visual example, see Fig. .1.1

Many artiﬁcial intelligence tasks can be solved by designing the right set of

features to extract for that task, then providing these features to a simple machine

learning algorithm. For example, a useful feature for speaker identiﬁcation from

sound is an estimate of the size of speaker’s vocal tract. It therefore gives a strong

clue as to whether the speaker is a man, woman, or child.

However, for many tasks, it is diﬃcult to know what features should be extracted.

For example, suppose that we would like to write a program to detect cars in

photographs. We know that cars have wheels, so we might like to use the presence

of a wheel as a feature.Unfortunately, it is diﬃcult to describe exactly what a

wheel looks like in terms of pixel values. A wheel has a simple geometric shape but

its image may be complicated by shadows falling on the wheel, the sun glaring oﬀ

the metal parts of the wheel, the fender of the car or an object in the foreground

obscuring part of the wheel, and so on.

CHAPTER 1. INTRODUCTION

Cartesiancoordinates

Polarcoordinates

Figure 1.1: Example ofdiﬀerent representations: suppose we want to separatetwo

categories of data by drawing a line between them in a scatterplot. In the plot on the left,

we represent some data using Cartesian coordinates, and the task is impossible. In the plot

on the right, we represent the data with polar coordinates and the task becomes simple to

solve with a vertical line. (Figure produced in collaboration with David Warde-Farley)

One solution to this problem is to use machine learning to discover not only

the mapping from representation to output but also the representation itself.

This approach is known as representation learning. Learned representations often

result in much better performancethan can be obtainedwith hand-designed

representations. They also allow AI systems to rapidly adapt to new tasks, with

minimal human intervention. A representation learning algorithm can discover a

good set of features for a simple task in minutes, or a complex task in hours to

months. Manually designing features for a complex task requires a great deal of

human time and eﬀort; it can take decades for an entire community of researchers.

The quintessential example of a representation learning algorithm is the au-

toencoder. An autoencoder is the combination of an encoder function that converts

the input data into a diﬀerent representation, and a decoder function that converts

the new representation back into the original format. Autoencoders are trained to

preserve as much information as possible when an input is run through the encoder

and then the decoder, but are also trained to make the new representation have

various nice properties. Diﬀerent kinds of autoencoders aim to achieve diﬀerent

kinds of properties.

When designing features or algorithms for learning features, our goal is usually

to separate the that explain the observed data. In this context,factors of variation

we use the word “factors” simply to refer to separate sources of inﬂuence; the factors

are usually not combined by multiplication. Such factors are often not quantities

CHAPTER 1. INTRODUCTION

that are directly observed. Instead, they may exist either as unobserved objects

or unobserved forces in the physical world that aﬀect observable quantities. They

may also exist as constructs in the human mind that provide useful simplifying

explanations or inferred causes of the observed data. They can be thought of as

concepts or abstractions that help us make sense of the rich variability in the data.

When analyzing a speech recording, the factors of variation include the speaker’s

age, their sex, their accent and the words that they are speaking. When analyzing

an image of a car, the factors of variation include the position of the car, its color,

and the angle and brightness of the sun.

A major source of diﬃculty in many real-world artiﬁcial intelligence applications

is that many of the factors of variation inﬂuence every single piece of data we are

able to observe. The individual pixels in an image of a red car might be very close

to black at night. The shape of the car’s silhouette depends on the viewing angle.

Most applications require us to the factors of variation and discard thedisentangle

ones that we do not care about.

Of course, it can be very diﬃcult to extract such high-level, abstract features

from raw data. Many of these factors of variation, such as a speaker’s accent,

can be identiﬁed only using sophisticated, nearly human-level understanding of

the data. When it is nearly as diﬃcult to obtain a representation as to solve the

original problem, representation learning does not, at ﬁrst glance, seem to help us.

Deep learning solves this central problem in representation learning by introduc-

ing representations that are expressed in terms of other, simpler representations.

Deep learning allows the computer to build complex concepts out of simpler con-

cepts. Fig. shows how a deep learning system can represent the concept of an1.2

image of a person by combining simpler concepts, such as corners and contours,

which are in turn deﬁned in terms of edges.

The quintessential example of a deep learning model is the feedforward deep

network or multilayer perceptron (MLP). A multilayer perceptron is just a mathe-

matical function mapping some set of input values to output values. The function

is formed by composing many simpler functions. We can think of each application

of a diﬀerent mathematical function as providing a new representation of the input.

The idea of learning the right representation for the data provides one perspec-

tive on deep learning. Another perspective on deep learning is that depth allows the

computer to learn a multi-step computer program. Each layer of the representation

can be thought of as the state of the computer’s memory after executing another

set of instructions in parallel. Networks with greater depth can execute more

instructions in sequence. Sequential instructions oﬀer great power because later

instructions can refer back to the results of earlier instructions. According to this

剩余801页未读，继续阅读

承让@

粉丝: 8
资源: 380

深度学习基础与线性代数概览

Deep Learning.pdf

lessons learned deep learning.pdf

learning.pdf

Python Deep Learning.pdf

Introduction to Deep Learning.pdf

Fundamentals of Deep Learning.pdf

tensorflow for deep learning.pdf

UFLDL_deep learning.pdf

Advanced Applied Deep Learning.pdf

Generalization in Deep Learning.pdf

最新资源