深度学习基础与应用

需积分: 9 7 浏览量更新于2024-07-19 收藏 21.66MB PDF 举报

"《Deep Learning》是一本由Ian Goodfellow、Yoshua Bengio和Aaron Courville合著的专业书籍，深入探讨了深度学习的概念和技术。本书旨在介绍深度学习的基础和应用，适合对机器学习有兴趣的读者，特别是那些希望深入理解深度学习原理的学者和从业者。书中涵盖了线性代数、概率论与信息理论、统计推断、优化理论、计算机视觉、自然语言处理等多个关键领域的基础知识。" 深度学习是现代人工智能领域的一个核心分支，它源于人工神经网络的研究，特别是在多层感知器的基础上发展起来的深度神经网络。深度学习的特点在于其多层的结构，这些层次能够逐步学习和构建数据的复杂表示，从原始输入的简单特征到高层的抽象概念。这种分层学习的能力使得深度学习在图像识别、语音识别、自然语言处理等任务中表现卓越。 2006年，Hinton等人通过深度置信网络（DBN）提出了非监督的逐层训练方法，这为解决深度学习模型的训练问题开辟了新途径。随后，他们还引入了多层自动编码器，进一步推动了深度学习的发展。Lecun等人提出的卷积神经网络（CNN）则在图像处理中取得了突破，其特有的卷积层和池化层设计显著减少了参数数量，提高了模型的训练效率和泛化能力。深度学习的核心思想是特征学习和分层表示。通过自动化特征提取，深度学习模型能从原始数据中自动生成有用的特征，而无需人工精心设计。这使得深度学习在许多任务中优于传统的机器学习方法，尤其是在高维度和复杂数据集上。本书《Deep Learning》首先介绍了线性代数的基础，包括标量、向量、矩阵和张量，以及它们之间的运算。接着讲解概率论和信息理论，阐述随机变量、概率分布、期望、方差和协方差等概念。这些数学工具是理解和构建深度学习模型的基础。概率论部分，作者讨论了为什么需要概率，以及随机变量、概率分布、条件概率和独立性的概念。信息理论则涉及到如何量化信息和不确定性，这对于理解和优化模型的表示和学习过程至关重要。除了理论部分，书中还将涉及实际应用，如统计推断、优化算法的选择和实施，以及如何将这些理论应用于计算机视觉和自然语言处理等领域。通过这些内容的学习，读者不仅可以掌握深度学习的原理，还能学会如何在实际项目中应用深度学习技术，解决各种实际问题。

Chapter 1

Introduction

Inventors have long dreamed of creating machines that think. This desire dates

back to at least the time of ancient Greece. The mythical ﬁgures Pygmalion,

Daedalus, and Hephaestus may all be interpreted as legendary inventors, and

Galatea, Talos, and Pandora may all be regarded as artiﬁcial life ( ,Ovid and Martin

2004 Sparkes 1996 Tandy 1997; , ; , ).

When programmable computers were ﬁrst conceived, people wondered whether

they might become intelligent, over a hundred years before one was built (Lovelace,

1842). Today, artiﬁcial intelligence (AI) is a thriving ﬁeld with many practical

applications and active research topics. We look to intelligent software to automate

routine labor, understand speech or images, make diagnoses in medicine and

support basic scientiﬁc research.

In the early days of artiﬁcial intelligence, the ﬁeld rapidly tackled and solved

problems that are intellectually diﬃcult for human beings but relatively straight-

forward for computers—problems that can be described by a list of formal, math-

ematical rules. The true challenge to artiﬁcial intelligence proved to be solving

the tasks that are easy for people to perform but hard for people to describe

formally—problems that we solve intuitively, that feel automatic, like recognizing

spoken words or faces in images.

This book is about a solution to these more intuitive problems. This solution is

to allow computers to learn from experience and understand the world in terms of a

hierarchy of concepts, with each concept deﬁned in terms of its relation to simpler

concepts. By gathering knowledge from experience, this approach avoids the need

for human operators to formally specify all of the knowledge that the computer

needs. The hierarchy of concepts allows the computer to learn complicated concepts

by building them out of simpler ones. If we draw a graph showing how these

CHAPTER 1. INTRODUCTION

concepts are built on top of each other, the graph is deep, with many layers. For

this reason, we call this approach to AI deep learning.

Many of the early successes of AI took place in relatively sterile and formal

environments and did not require computers to have much knowledge about

the world. For example, IBM’s Deep Blue chess-playing system defeated world

champion Garry Kasparov in 1997 ( , ). Chess is of course a very simpleHsu 2002

world, containing only sixty-four locations and thirty-two pieces that can move

in only rigidly circumscribed ways. Devising a successful chess strategy is a

tremendous accomplishment, but the challenge is not due to the diﬃculty of

describing the set of chess pieces and allowable moves to the computer. Chess

can be completely described by a very brief list of completely formal rules, easily

provided ahead of time by the programmer.

Ironically, abstract and formal tasks that are among the most diﬃcult mental

undertakings for a human being are among the easiest for a computer. Computers

have long been able to defeat even the best human chess player, but are only

recently matching some of the abilities of average human beings to recognize objects

or speech. A person’s everyday life requires an immense amount of knowledge

about the world. Much of this knowledge is subjective and intuitive, and therefore

diﬃcult to articulate in a formal way. Computers need to capture this same

knowledge in order to behave in an intelligent way. One of the key challenges in

artiﬁcial intelligence is how to get this informal knowledge into a computer.

Several artiﬁcial intelligence projects have sought to hard-code knowledge about

the world in formal languages. A computer can reason about statements in these

formal languages automatically using logical inference rules. This is known as the

knowledge base approach to artiﬁcial intelligence. None of these projects has led to

a major success. One of the most famous such projects is Cyc ( ,Lenat and Guha

1989). Cyc is an inference engine and a database of statements in a language

called CycL. These statements are entered by a staﬀ of human supervisors. It is an

unwieldy process. People struggle to devise formal rules with enough complexity

to accurately describe the world. For example, Cyc failed to understand a story

about a person named Fred shaving in the morning ( , ). Its inferenceLinde 1992

engine detected an inconsistency in the story: it knew that people do not have

electrical parts, but because Fred was holding an electric razor, it believed the

entity “FredWhileShaving” contained electrical parts. It therefore asked whether

Fred was still a person while he was shaving.

The diﬃculties faced by systems relying on hard-coded knowledge suggest that

AI systems need the ability to acquire their own knowledge, by extracting patterns

from raw data. This capability is known as machine learning. The introduction

CHAPTER 1. INTRODUCTION

of machine learning allowed computers to tackle problems involving knowledge

of the real world and make decisions that appear subjective. A simple machine

learning algorithm called logistic regression can determine whether to recommend

cesarean delivery (Mor-Yosef 1990et al., ). A simple machine learning algorithm

called can separate legitimate e-mail from spam e-mail.naive Bayes

The performance of these simple machine learning algorithms depends heavily

on the representation of the data they are given. For example, when logistic

regression is used to recommend cesarean delivery, the AI system does not examine

the patient directly. Instead, the doctor tells the system several pieces of relevant

information, such as the presence or absence of a uterine scar. Each piece of

information included in the representation of the patient is known as a feature.

Logistic regression learns how each of these features of the patient correlates with

various outcomes. However, it cannot inﬂuence the way that the features are

deﬁned in any way. If logistic regression was given an MRI scan of the patient,

rather than the doctor’s formalized report, it would not be able to make useful

predictions. Individual pixels in an MRI scan have negligible correlation with any

complications that might occur during delivery.

This dependence on representations is a general phenomenon that appears

throughout computer science and even daily life. In computer science, opera-

tions such as searching a collection of data can proceed exponentially faster if

the collection is structured and indexed intelligently. People can easily perform

arithmetic on Arabic numerals, but ﬁnd arithmetic on Roman numerals much

more time-consuming. It is not surprising that the choice of representation has an

enormous eﬀect on the performance of machine learning algorithms. For a simple

visual example, see Fig. .1.1

Many artiﬁcial intelligence tasks can be solved by designing the right set of

features to extract for that task, then providing these features to a simple machine

learning algorithm. For example, a useful feature for speaker identiﬁcation from

sound is an estimate of the size of speaker’s vocal tract. It therefore gives a strong

clue as to whether the speaker is a man, woman, or child.

However, for many tasks, it is diﬃcult to know what features should be extracted.

For example, suppose that we would like to write a program to detect cars in

photographs. We know that cars have wheels, so we might like to use the presence

of a wheel as a feature. Unfortunately, it is diﬃcult to describe exactly what a

wheel looks like in terms of pixel values. A wheel has a simple geometric shape but

its image may be complicated by shadows falling on the wheel, the sun glaring oﬀ

the metal parts of the wheel, the fender of the car or an object in the foreground

obscuring part of the wheel, and so on.

CHAPTER 1. INTRODUCTION

Cartesian coordinates

Polar coordinates

Figure 1.1: Example of diﬀerent representations: suppose we want to separate two

categories of data by drawing a line between them in a scatterplot. In the plot on the left,

we represent some data using Cartesian coordinates, and the task is impossible. In the plot

on the right, we represent the data with polar coordinates and the task becomes simple to

solve with a vertical line. (Figure produced in collaboration with David Warde-Farley)

One solution to this problem is to use machine learning to discover not only

the mapping from representation to output but also the representation itself.

This approach is known as representation learning. Learned representations often

result in much better performance than can be obtained with hand-designed

representations. They also allow AI systems to rapidly adapt to new tasks, with

minimal human intervention. A representation learning algorithm can discover a

good set of features for a simple task in minutes, or a complex task in hours to

months. Manually designing features for a complex task requires a great deal of

human time and eﬀort; it can take decades for an entire community of researchers.

The quintessential example of a representation learning algorithm is the au-

toencoder. An autoencoder is the combination of an encoder function that converts

the input data into a diﬀerent representation, and a decoder function that converts

the new representation back into the original format. Autoencoders are trained to

preserve as much information as possible when an input is run through the encoder

and then the decoder, but are also trained to make the new representation have

various nice properties. Diﬀerent kinds of autoencoders aim to achieve diﬀerent

kinds of properties.

When designing features or algorithms for learning features, our goal is usually

to separate the that explain the observed data. In this context,factors of variation

we use the word “factors” simply to refer to separate sources of inﬂuence; the factors

are usually not combined by multiplication. Such factors are often not quantities

CHAPTER 1. INTRODUCTION

that are directly observed. Instead, they may exist either as unobserved objects

or unobserved forces in the physical world that aﬀect observable quantities. They

may also exist as constructs in the human mind that provide useful simplifying

explanations or inferred causes of the observed data. They can be thought of as

concepts or abstractions that help us make sense of the rich variability in the data.

When analyzing a speech recording, the factors of variation include the speaker’s

age, their sex, their accent and the words that they are speaking. When analyzing

an image of a car, the factors of variation include the position of the car, its color,

and the angle and brightness of the sun.

A major source of diﬃculty in many real-world artiﬁcial intelligence applications

is that many of the factors of variation inﬂuence every single piece of data we are

able to observe. The individual pixels in an image of a red car might be very close

to black at night. The shape of the car’s silhouette depends on the viewing angle.

Most applications require us to the factors of variation and discard thedisentangle

ones that we do not care about.

Of course, it can be very diﬃcult to extract such high-level, abstract features

from raw data. Many of these factors of variation, such as a speaker’s accent,

can be identiﬁed only using sophisticated, nearly human-level understanding of

the data. When it is nearly as diﬃcult to obtain a representation as to solve the

original problem, representation learning does not, at ﬁrst glance, seem to help us.

Deep learning solves this central problem in representation learning by introduc-

ing representations that are expressed in terms of other, simpler representations.

Deep learning allows the computer to build complex concepts out of simpler con-

cepts. Fig. shows how a deep learning system can represent the concept of an1.2

image of a person by combining simpler concepts, such as corners and contours,

which are in turn deﬁned in terms of edges.

The quintessential example of a deep learning model is the feedforward deep

network or multilayer perceptron (MLP). A multilayer perceptron is just a mathe-

matical function mapping some set of input values to output values. The function

is formed by composing many simpler functions. We can think of each application

of a diﬀerent mathematical function as providing a new representation of the input.

The idea of learning the right representation for the data provides one perspec-

tive on deep learning. Another perspective on deep learning is that depth allows the

computer to learn a multi-step computer program. Each layer of the representation

can be thought of as the state of the computer’s memory after executing another

set of instructions in parallel. Networks with greater depth can execute more

instructions in sequence. Sequential instructions oﬀer great power because later

instructions can refer back to the results of earlier instructions. According to this

剩余801页未读，继续阅读

mathstar

粉丝: 8
资源: 24

深度学习基础与应用

deeplearning4j-nn-1.0.0-M1.1-API文档-中文版.zip

deep learning

DeepLearning

deeplearning

【demx96】美容美甲类网站手机模板.zip

【图像去噪】基于matlab小波域双重局部维纳滤波图像去噪（含PSNR）【含Matlab源码 1642期】.md

【图像去噪】基于matlab自适应双边滤波SAR灰色图像去噪（含PNSR）【含Matlab源码 4232期】.md

小波变换遥感影像（高光谱和多波段）融合（含熵值 相关系数 光谱扭曲度 峰值信噪比）【含Matlab源码 4433期】.md

【图像复原】基于matlab维纳滤波图像复原（含PSNR MSE）【含Matlab源码 4519期】.md

【demx184】器材器械企业通用单独手机模板.zip

最新资源

小波变换遥感影像（高光谱和多波段）融合（含熵值相关系数光谱扭曲度峰值信噪比）【含Matlab源码 4433期】.md