深度学习入门经典：数学与基础工具详解

需积分: 50 106 浏览量更新于2024-07-17 收藏 23.94MB PDF 举报

"《深度学习Cookbook》是一本专为想要深入了解和应用深度学习技术的专业人士设计的权威指南，它不仅涵盖了深度学习的基础理论，还提供了实用的工具和实例，适合初学者和经验丰富的从业者查阅。本书由深度学习领域的知名专家Ian Goodfellow、Yoshua Bengio和Aaron Courville合著，英文原版出版，非扫描版确保了高质量的内容。正文开始：第一部分，"Introduction"，旨在引导读者了解深度学习的历史趋势以及它在当今信息技术中的重要性。书中详细解释了谁应该阅读这本书：无论是初入深度学习领域的学生，还是希望提升现有技能的工程师，都能从中找到适合的学习路径。作者会回顾自上世纪50年代以来深度学习的发展，强调深度神经网络如何从简单的模型发展到如今能够解决复杂问题的强大工具。第二部分，"Applied Math and Machine Learning Basics"，深入浅出地介绍了线性代数等数学基础知识，这些都是深度学习的基石。章节包括了：向量和矩阵的基本概念，如标量、矢量、矩阵和张量的操作；矩阵乘法、单位矩阵与逆矩阵的计算；线性依赖与向量空间的定义，以及向量的范数和特殊类型的矩阵。此外，还有特征值分解（Eigen decomposition）、奇异值分解（SVD）以及 Moore-Penrose 倒数等高级概念，这些都是实现神经网络训练和优化的核心技巧。第三部分，"Probability and Information Theory"，探讨了概率论和信息论在深度学习中的应用。这部分强调了概率在理解随机性和不确定性上的核心作用。读者将学习随机变量、概率分布及其性质，如边际概率、条件概率、独立性和条件独立性，以及期望、方差和协方差等统计概念。这些知识对于处理模型的不确定性和决策制定至关重要。通过每章精心编排的练习和实际案例，读者可以逐步掌握深度学习的基本原理和实践技巧。《深度学习Cookbook》不仅是理论教材，也是深度学习工程师的实用参考手册，是进入这个领域不可或缺的参考资料。无论是理论探索者还是实践经验者，这本书都能提供一个扎实且全面的深度学习学习路径。"

Chapter 1

Introduction

Inventors have long dreamed of creating machines that think. This desire dates

back to at least the time of ancient Greece. The mythical ﬁgures Pygmalion,

Daedalus, and Hephaestus may all be interpreted as legendary inventors, and

Galatea, Talos, and Pandora may all be regarded as artiﬁcial life ( ,Ovid and Martin

2004 Sparkes 1996 Tandy 1997; , ; , ).

When programmable computers were ﬁrst conceived, people wondered whether

such machines might become intelligent, over a hundred years before one was

built (Lovelace 1842, ). Today,

artiﬁcial intelligence

(AI) is a thriving ﬁeld with

many practical applications and active research topics. We look to intelligent

software to automate routine labor, understand speech or images, make diagnoses

in medicine and support basic scientiﬁc research.

In the early days of artiﬁcial intelligence, the ﬁeld rapidly tackled and solved

problems that are intellectually diﬃcult for human beings but relatively straight-

forward for computers—problems that can be described by a list of formal, math-

ematical rules.The true challenge to artiﬁcial intelligence proved to be solving

the tasks that are easy for people to perform but hard for people to describe

formally—problems that we solve intuitively, that feel automatic, like recognizing

spoken words or faces in images.

This book is about a solution to these more intuitive problems. This solution is

to allow computers to learn from experience and understand the world in terms of a

hierarchy of concepts, with each concept deﬁned in terms of its relation to simpler

concepts. By gathering knowledge from experience, this approach avoids the need

for human operators to formally specify all of the knowledge that the computer

needs. The hierarchy of concepts allows the computer to learn complicated concepts

by building them out of simpler ones. If we draw a graph showing how these

CHAPTER 1. INTRODUCTION

concepts are built on top of each other, the graph is deep, with many layers. For

this reason, we call this approach to AI .deep learning

Many of the early successes of AI took place in relatively sterile and formal

environments and did not require computers to have much knowledge about

the world. For example, IBM’s Deep Blue chess-playing system defeated world

champion Garry Kasparov in 1997 ( , ). Chess is of course a very simpleHsu 2002

world, containing only sixty-four locations and thirty-two pieces that can move

in only rigidly circumscribed ways. Devising a successful chess strategy isa

tremendous accomplishment,but the challenge is not due to the diﬃculty of

describing the set of chess pieces and allowable moves to the computer. Chess

can be completely described by a very brief list of completely formal rules, easily

provided ahead of time by the programmer.

Ironically, abstract and formal tasks that are among the most diﬃcult mental

undertakings for a human being are among the easiest for a computer. Computers

have long been able to defeat even the best human chess player, but are only

recently matching some of the abilities of average human beings to recognize objects

or speech. A person’s everyday life requires an immense amount of knowledge

about the world. Much of this knowledge is subjective and intuitive, and therefore

diﬃcult to articulate in a formal way. Computers need to capture this same

knowledge in order to behave in an intelligent way. One of the key challenges in

artiﬁcial intelligence is how to get this informal knowledge into a computer.

Several artiﬁcial intelligence projects have sought to hard-code knowledge about

the world in formal languages. A computer can reason about statements in these

formal languages automatically using logical inference rules. This is known as the

knowledge base

approach to artiﬁcial intelligence. None of these projects has led

to a major success. One of the most famous such projects is Cyc ( ,Lenat and Guha

1989). Cyc is an inference engine and a database of statements in a language

called CycL. These statements are entered by a staﬀ of human supervisors. It is an

unwieldy process. People struggle to devise formal rules with enough complexity

to accurately describe the world. For example, Cyc failed to understand a story

about a person named Fred shaving in the morning ( , ). Its inferenceLinde 1992

engine detected an inconsistency in the story:it knew that people do not have

electrical parts, but because Fred was holding an electric razor, it believed the

entity “FredWhileShaving” contained electrical parts. It therefore asked whether

Fred was still a person while he was shaving.

The diﬃculties faced by systems relying on hard-coded knowledge suggest

that AI systems need the ability to acquire their own knowledge, by extracting

patterns from raw data. This capability is known as

machine learning

. The

CHAPTER 1. INTRODUCTION

introduction of machine learning allowed computers to tackle problems involving

knowledge of the real world and make decisions that appear subjective. A simple

machine learning algorithm called

logistic regression

can determine whether to

recommend cesarean delivery (Mor-Yosef 1990et al., ). A simple machine learning

algorithm called naive Bayes can separate legitimate e-mail from spam e-mail.

The performance of these simple machine learning algorithms depends heavily

on the

representation

of the data they are given. For example, when logistic

regression is used to recommend cesarean delivery, the AI system does not examine

the patient directly. Instead, the doctor tells the system several pieces of relevant

information, such as the presence or absence of a uterine scar. Each piece of

information included in the representation of the patient is known as a

feature

Logistic regression learns how each of these features of the patient correlates with

various outcomes. However, it cannot inﬂuence the way that the features are

deﬁned in any way.If logistic regression was given an MRI scan of the patient,

rather than the doctor’s formalized report, it would not be able to make useful

predictions. Individual pixels in an MRI scan have negligible correlation with any

complications that might occur during delivery.

This dependence on representations is a general phenomenon that appears

throughout computer science and even daily life. In computer science, opera-

tions such as searching a collection of data can proceed exponentially faster if

the collection is structured and indexed intelligently. People can easily perform

arithmetic on Arabic numerals, but ﬁnd arithmetic on Roman numerals much

more time-consuming. It is not surprising that the choice of representation has an

enormous eﬀect on the performance of machine learning algorithms. For a simple

visual example, see ﬁgure .1.1

Many artiﬁcial intelligence tasks can be solved by designing the right set of

features to extract for that task, then providing these features to a simple machine

learning algorithm. For example, a useful feature for speaker identiﬁcation from

sound is an estimate of the size of speaker’s vocal tract. It therefore gives a strong

clue as to whether the speaker is a man, woman, or child.

However, for many tasks, it is diﬃcult to know what features should be extracted.

For example, suppose that we would like to write a program to detect cars in

photographs. We know that cars have wheels, so we might like to use the presence

of a wheel as a feature. Unfortunately, it is diﬃcult to describe exactly what a

wheel looks like in terms of pixel values. A wheel has a simple geometric shape but

its image may be complicated by shadows falling on the wheel, the sun glaring oﬀ

the metal parts of the wheel, the fender of the car or an object in the foreground

obscuring part of the wheel, and so on.

CHAPTER 1. INTRODUCTION













Figure 1.1: Example of diﬀerent representations: suppose we want to separate two

categories of data by drawing a line between them in a scatterplot. In the plot on the left,

we represent some data using Cartesian coordinates, and the task is impossible. In the plot

on the right, we represent the data with polar coordinates and the task becomes simple to

solve with a vertical line. Figure produced in collaboration with David Warde-Farley.

One solution to this problem is to use machine learning to discover not only

the mapping from representation to output but also the representation itself.

This approach is known as

representation learning

.Learned representations

often result in much better performance than can be obtained with hand-designed

representations. They also allow AI systems to rapidly adapt to new tasks, with

minimal human intervention. A representation learning algorithm can discover a

good set of features for a simple task in minutes, or a complex task in hours to

months. Manually designing features for a complex task requires a great deal of

human time and eﬀort; it can take decades for an entire community of researchers.

The quintessential example of a representation learning algorithm is the

au-

toencoder

. An autoencoder is the combination of an

encoder

function that

converts the input data into a diﬀerent representation, and a

decoder

function

that converts the new representation back into the original format. Autoencoders

are trained to preserve as much information as possible when an input is run

through the encoder and then the decoder, but are also trained to make the new

representation have various nice properties. Diﬀerent kinds of autoencoders aim to

achieve diﬀerent kinds of properties.

When designing features or algorithms for learning features, our goal is usually

to separate the

factors of variation

that explain the observed data. In this

context, we use the word “factors” simply to refer to separate sources of inﬂuence;

the factors are usually not combined by multiplication. Such factors are often not

CHAPTER 1. INTRODUCTION

quantities that are directly observed. Instead, they may exist either as unobserved

objects or unobserved forces in the physical world that aﬀect observable quantities.

They may also exist as constructs in the human mind that provide useful simplifying

explanations or inferred causes of the observed data. They can be thought of as

concepts or abstractions that help us make sense of the rich variability in the data.

When analyzing a speech recording, the factors of variation include the speaker’s

age, their sex, their accent and the words that they are speaking. When analyzing

an image of a car, the factors of variation include the position of the car, its color,

and the angle and brightness of the sun.

A major source of diﬃculty in many real-world artiﬁcial intelligence applications

is that many of the factors of variation inﬂuence every single piece of data we are

able to observe. The individual pixels in an image of a red car might be very close

to black at night. The shape of the car’s silhouette depends on the viewing angle.

Most applications require us to the factors of variation and discard thedisentangle

ones that we do not care about.

Of course, it can be very diﬃcult to extract such high-level, abstract features

from raw data. Many of these factors of variation, such as a speaker’s accent,

can be identiﬁed only using sophisticated, nearly human-level understanding of

the data. When it is nearly as diﬃcult to obtain a representation as to solve the

original problem, representation learning does not, at ﬁrst glance, seem to help us.

Deep learning

solves this central problem in representation learning by intro-

ducing representations that are expressed in terms of other, simpler representations.

Deep learning allows the computer to build complex concepts out of simpler con-

cepts. Figure shows how a deep learning system can represent the concept of1.2

an image of a person by combining simpler concepts, such as corners and contours,

which are in turn deﬁned in terms of edges.

The quintessential example of a deep learning model is the feedforward deep

network or

multilayer perceptron

(MLP). A multilayer perceptron is just a

mathematical function mapping some set of input values to output values. The

function is formed by composing many simpler functions. We can think of each

application of a diﬀerent mathematical function as providing a new representation

of the input.

The idea of learning the right representation for the data provides one perspec-

tive on deep learning. Another perspective on deep learning is that depth allows the

computer to learn a multi-step computer program. Each layer of the representation

can be thought of as the state of the computer’s memory after executing another

set of instructions in parallel. Networks with greater depth can execute more

instructions in sequence. Sequential instructions oﬀer great power because later

剩余799页未读，继续阅读

ring8moving

粉丝: 1

深度学习入门经典：数学与基础工具详解

Deep Learning Cookbook

DEEP learning cookbook

Python Deep Learning Cookbook epub

Deep Learning Cookbook.epub_deeplearning_cookbook_epub_python_

R Deep Learning Cookbook

APACHE SPARK DEEP LEARNING COOKBOOK

Deep Learning Cookbook--2018

Packt Apache Spark Deep Learning Cookbook

R Deep Learning Cookbook [EPUB]

TensorFlow 1.x Deep Learning Cookbook

最新资源