深度学习入门：Bengio视角与基础知识

需积分: 9 39 浏览量更新于2024-07-19 收藏 68.71MB PDF 举报

"《深度学习》（Deep Learning）是由伊恩·古德费洛（Ian Goodfellow）、约书亚·本吉奥（Yoshua Bengio）和亚伦·库尔维尔（Aaron Courville）合著的一部权威著作，它深入探讨了深度学习这一领域的核心概念和技术。该书的结构清晰，旨在为不同背景的读者提供一个全面的深度学习入门指南。标题"Deep Learning by Bengio"强调了作者之一的杰出贡献者，即本吉奥，他是深度学习领域的先驱之一，这本书汇集了他的专业知识和经验。书中涵盖了深度学习的历史趋势，从早期的发展到现代的应用，让读者了解这一技术的演变过程。在"Applied Math and Machine Learning Basics"部分，作者们首先介绍了线性代数的基础，包括向量、矩阵和张量的概念及其操作，如矩阵乘法、单位矩阵和逆矩阵的性质。这部分内容对理解神经网络中的权重更新和模型结构至关重要。接着，讨论了线性依赖与span的概念，以及向量的范数，这些都是评估数据表示的重要工具。书中还详细讲解了特征分解（如特征值分解和奇异值分解），这些在降维和特征提取中扮演着关键角色。概率和信息论是深度学习理论的基石，作者在第三章中深入剖析了概率的基本原理，如随机变量、概率分布、边缘概率和条件概率。链式规则和独立性概念对于理解深度学习中的模型训练和推断至关重要。此外，期望、方差和协方差等统计概念在计算损失函数和优化算法中扮演着核心角色。本书通过实例，如主成分分析（PCA），展示了这些数学工具在实际问题中的应用。整本书不仅适合深度学习的初学者系统学习，也对专业研究人员提供了实用的参考资源，帮助他们深入了解和实践深度学习技术。"

Chapter 1

Introduction

Inventors have long dreamed of creating machines that think. This desire dates

back to at least the time of ancient Greece. The mythical ﬁgures Pygmalion,

Daedalus, and Hephaestus may all be interpreted as legendary inventors, and

Galatea, Talos, and Pandora may all be regarded as artiﬁcial life ( ,Ovid and Martin

2004 Sparkes 1996 Tandy 1997; , ; , ).

When programmable computers were ﬁrst conceived, people wondered whether

they might become intelligent, over a hundred years before one was built (Lovelace,

1842). Today, artiﬁcial intelligence (AI) is a thriving ﬁeld with many practical

applications and active research topics. We look to intelligent software to automate

routine labor,understand speech or images,make diagnoses in medicine and

support basic scientiﬁc research.

In the early days of artiﬁcial intelligence, the ﬁeld rapidly tackled and solved

problems that are intellectually diﬃcult for human beings but relatively straight-

forward for computers—problems that can be described by a list of formal, math-

ematical rules.The true challenge to artiﬁcial intelligence proved to be solving

the tasks that are easy for people to perform but hard for people to describe

formally—problems that we solve intuitively, that feel automatic, like recognizing

spoken words or faces in images.

This book is about a solution to these more intuitive problems. This solution is

to allow computers to learn from experience and understand the world in terms of a

hierarchy of concepts, with each concept deﬁned in terms of its relation to simpler

concepts. By gathering knowledge from experience, this approach avoids the need

for human operators to formally specify all of the knowledge that the computer

needs. The hierarchy of concepts allows the computer to learn complicated concepts

by building them out of simpler ones. If we draw a graph showing how these

CHAPTER 1. INTRODUCTION

concepts are built on top of each other, the graph is deep, with many layers. For

this reason, we call this approach to AI deep learning.

Many of the early successes of AI took place in relatively sterile and formal

environments and did not require computers to have much knowledge about

the world.For example, IBM’s Deep Blue chess-playing system defeated world

champion Garry Kasparov in 1997 ( , ). Chess is of course a very simpleHsu 2002

world, containing only sixty-four locations and thirty-two pieces that can move

in only rigidly circumscribed ways. Devising a successful chess strategy isa

tremendous accomplishment,but the challenge is not due to the diﬃculty of

describing the set of chess pieces and allowable moves to the computer. Chess

can be completely described by a very brief list of completely formal rules, easily

provided ahead of time by the programmer.

Ironically, abstract and formal tasks that are among the most diﬃcult mental

undertakings for a human being are among the easiest for a computer. Computers

have long been able to defeat even the best human chess player, but are only

recently matching some of the abilities of average human beings to recognize objects

or speech. A person’s everyday life requires an immense amount of knowledge

about the world. Much of this knowledge is subjective and intuitive, and therefore

diﬃcult to articulate in a formal way. Computers need to capture this same

knowledge in order to behave in an intelligent way. One of the key challenges in

artiﬁcial intelligence is how to get this informal knowledge into a computer.

Several artiﬁcial intelligence projects have sought to hard-code knowledge about

the world in formal languages. A computer can reason about statements in these

formal languages automatically using logical inference rules. This is known as the

knowledge base approach to artiﬁcial intelligence. None of these projects has led to

a major success. One of the most famous such projects is Cyc ( ,Lenat and Guha

1989). Cyc is an inference engine and a database of statements in a language

called CycL. These statements are entered by a staﬀ of human supervisors. It is an

unwieldy process. People struggle to devise formal rules with enough complexity

to accurately describe the world. For example, Cyc failed to understand a story

about a person named Fred shaving in the morning ( , ). Its inferenceLinde 1992

engine detected an inconsistency in the story:it knew that people do not have

electrical parts, but because Fred was holding an electric razor, it believed the

entity “FredWhileShaving” contained electrical parts. It therefore asked whether

Fred was still a person while he was shaving.

The diﬃculties faced by systems relying on hard-coded knowledge suggest that

AI systems need the ability to acquire their own knowledge, by extracting patterns

from raw data. This capability is known as machine learning. The introduction

CHAPTER 1. INTRODUCTION

of machine learning allowed computers to tackle problems involving knowledge

of the real world and make decisions that appear subjective. A simple machine

learning algorithm called logistic regression can determine whether to recommend

cesarean delivery (Mor-Yosef 1990et al., ). A simple machine learning algorithm

called can separate legitimate e-mail from spam e-mail.naive Bayes

The performance of these simple machine learning algorithms depends heavily

on the representation of the data they are given. For example, when logistic

regression is used to recommend cesarean delivery, the AI system does not examine

the patient directly. Instead, the doctor tells the system several pieces of relevant

information, such as the presence or absence of a uterine scar. Each piece of

information included in the representation of the patient is known as a feature.

Logistic regression learns how each of these features of the patient correlates with

various outcomes. However, it cannot inﬂuence the way that the features are

deﬁned in any way.If logistic regression was given an MRI scan of the patient,

rather than the doctor’s formalized report, it would not be able to make useful

predictions. Individual pixels in an MRI scan have negligible correlation with any

complications that might occur during delivery.

This dependence on representations is a general phenomenon that appears

throughout computer science and even daily life. In computer science, opera-

tions such as searching a collection of data can proceed exponentially faster if

the collection is structured and indexed intelligently.People can easily perform

arithmetic on Arabic numerals, but ﬁnd arithmetic on Roman numerals much

more time-consuming. It is not surprising that the choice of representation has an

enormous eﬀect on the performance of machine learning algorithms. For a simple

visual example, see Fig. .1.1

Many artiﬁcial intelligence tasks can be solved by designing the right set of

features to extract for that task, then providing these features to a simple machine

learning algorithm. For example, a useful feature for speaker identiﬁcation from

sound is an estimate of the size of speaker’s vocal tract. It therefore gives a strong

clue as to whether the speaker is a man, woman, or child.

However, for many tasks, it is diﬃcult to know what features should be extracted.

For example, suppose that we would like to write a program to detect cars in

photographs. We know that cars have wheels, so we might like to use the presence

of a wheel as a feature.Unfortunately, it is diﬃcult to describe exactly what a

wheel looks like in terms of pixel values. A wheel has a simple geometric shape but

its image may be complicated by shadows falling on the wheel, the sun glaring oﬀ

the metal parts of the wheel, the fender of the car or an object in the foreground

obscuring part of the wheel, and so on.

CHAPTER 1. INTRODUCTION

Cartesiancoordinates

Polarcoordinates

Figure 1.1: Example ofdiﬀerent representations: suppose we want to separatetwo

categories of data by drawing a line between them in a scatterplot. In the plot on the left,

we represent some data using Cartesian coordinates, and the task is impossible. In the plot

on the right, we represent the data with polar coordinates and the task becomes simple to

solve with a vertical line. (Figure produced in collaboration with David Warde-Farley)

One solution to this problem is to use machine learning to discover not only

the mapping from representation to output but also the representation itself.

This approach is known as representation learning. Learned representations often

result in much better performancethan can be obtainedwith hand-designed

representations. They also allow AI systems to rapidly adapt to new tasks, with

minimal human intervention. A representation learning algorithm can discover a

good set of features for a simple task in minutes, or a complex task in hours to

months. Manually designing features for a complex task requires a great deal of

human time and eﬀort; it can take decades for an entire community of researchers.

The quintessential example of a representation learning algorithm is the au-

toencoder. An autoencoder is the combination of an encoder function that converts

the input data into a diﬀerent representation, and a decoder function that converts

the new representation back into the original format. Autoencoders are trained to

preserve as much information as possible when an input is run through the encoder

and then the decoder, but are also trained to make the new representation have

various nice properties. Diﬀerent kinds of autoencoders aim to achieve diﬀerent

kinds of properties.

When designing features or algorithms for learning features, our goal is usually

to separate the that explain the observed data. In this context,factors of variation

we use the word “factors” simply to refer to separate sources of inﬂuence; the factors

are usually not combined by multiplication. Such factors are often not quantities

CHAPTER 1. INTRODUCTION

that are directly observed. Instead, they may exist either as unobserved objects

or unobserved forces in the physical world that aﬀect observable quantities. They

may also exist as constructs in the human mind that provide useful simplifying

explanations or inferred causes of the observed data. They can be thought of as

concepts or abstractions that help us make sense of the rich variability in the data.

When analyzing a speech recording, the factors of variation include the speaker’s

age, their sex, their accent and the words that they are speaking. When analyzing

an image of a car, the factors of variation include the position of the car, its color,

and the angle and brightness of the sun.

A major source of diﬃculty in many real-world artiﬁcial intelligence applications

is that many of the factors of variation inﬂuence every single piece of data we are

able to observe. The individual pixels in an image of a red car might be very close

to black at night. The shape of the car’s silhouette depends on the viewing angle.

Most applications require us to the factors of variation and discard thedisentangle

ones that we do not care about.

Of course, it can be very diﬃcult to extract such high-level, abstract features

from raw data. Many of these factors of variation, such as a speaker’s accent,

can be identiﬁed only using sophisticated, nearly human-level understanding of

the data. When it is nearly as diﬃcult to obtain a representation as to solve the

original problem, representation learning does not, at ﬁrst glance, seem to help us.

Deep learning solves this central problem in representation learning by introduc-

ing representations that are expressed in terms of other, simpler representations.

Deep learning allows the computer to build complex concepts out of simpler con-

cepts. Fig. shows how a deep learning system can represent the concept of an1.2

image of a person by combining simpler concepts, such as corners and contours,

which are in turn deﬁned in terms of edges.

The quintessential example of a deep learning model is the feedforward deep

network or multilayer perceptron (MLP). A multilayer perceptron is just a mathe-

matical function mapping some set of input values to output values. The function

is formed by composing many simpler functions. We can think of each application

of a diﬀerent mathematical function as providing a new representation of the input.

The idea of learning the right representation for the data provides one perspec-

tive on deep learning. Another perspective on deep learning is that depth allows the

computer to learn a multi-step computer program. Each layer of the representation

can be thought of as the state of the computer’s memory after executing another

set of instructions in parallel. Networks with greater depth can execute more

instructions in sequence. Sequential instructions oﬀer great power because later

instructions can refer back to the results of earlier instructions. According to this

剩余801页未读，继续阅读

安素丶

粉丝: 0

深度学习入门：Bengio视角与基础知识

深度学习领域的巨著：Bengio《Deep Learning》解析

深度学习巨著：《Deep Learning》解读AlphaGo

深度学习权威著作：Yoshua Bengio的《Deep Learning》中文PDF

Deep Learning by Y. Bengio

Deep Learning- by Yoshua Bengio

Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville

Deep Learning-LeCun、Bengio和Hinton三大牛的综述

《Deep Learning》 Ian Goodfellow Yoshua Bengio Aaron Courville.zip

《深度学习》(Deep Learning) by Ian Goodfellow, Yoshua Bengio, and Aaro

Goodfellow Bengio 深度学习 Deep Learning Book pdf 最新中文版 v0.6

最新资源