深度学习入门：神经网络与学习机器

神经网络

需积分: 10 52 浏览量更新于2024-07-19 收藏 12.32MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"神经网络原理介绍，这是一本深入讲解神经网络的教程，适合学习神经网络工作原理，通过实例来实践算法，提升对神经网络的理解，并学习算法优化技巧。" 《神经网络与学习机器》是Simon Haykin教授编著的第三版教材，针对计算机科学领域的神经网络进行了详尽的阐述。这本书不仅是对前两版的修订和更新，还涵盖了神经网络领域的最新进展。Haykin教授，来自加拿大麦克马斯特大学，是该领域的权威专家。神经网络是一种模拟人脑神经元结构的计算模型，它由大量的处理单元（称为神经元）组成，这些神经元通过连接权重相互作用，能够进行复杂的非线性模式识别和数据建模。神经网络的核心概念包括前馈神经网络、反向传播算法、卷积神经网络（CNN）、循环神经网络（RNN）、深度学习等。本书详细介绍了神经网络的基础，包括感知器模型、线性可分问题和非线性问题的解决。对于更复杂的神经网络结构，如多层感知器（MLP），书中详细讲解了反向传播算法，这是训练多层网络的关键。此外，还涉及到了误差逆传播的学习规则，以及如何通过梯度下降法来优化网络参数。书中还涵盖了学习理论，包括局部最小值问题、全局最优解的寻找策略，以及适应性滤波器的概念，这些都是神经网络优化的重要组成部分。在实际应用部分，读者可以通过实例学习如何构建和训练神经网络，解决分类、回归、图像处理等问题。此外，书中还讨论了现代神经网络的一些重要变种，如卷积神经网络在图像识别中的应用，循环神经网络在序列数据处理中的优势，以及深度学习在大数据分析和人工智能领域的革命性影响。每个主题都配有详细的数学推导和可视化解释，帮助读者更好地理解这些复杂的概念。《神经网络与学习机器》是一本全面而深入的神经网络教程，无论你是初学者还是有一定基础的研究者，都能从中受益匪浅。通过阅读和实践，你将不仅能够掌握神经网络的基本原理，还能了解到算法优化的策略，为在人工智能领域进一步探索打下坚实的基础。

资源详情

资源推荐

Preface xv

function to be maximized. Here again, we see the practical benefit of hybridizing

ideas rooted in neural networks with complementary kernel-theoretic ones.

Chapter 10 exploits principles rooted in Shannon’s information theory as

tools for unsupervised learning. This rather long chapter begins by presenting a

review of Shannon’s information theory, with particular attention given to the con-

cepts of entropy, mutual information, and the Kullback–Leibler divergence (KLD).

The review also includes the concept of copulas, which, unfortunately, has been

largely overlooked for several decades. Most importantly, the copula provides a

measure of the statistical dependence between a pair of correlated random vari-

ables. In any event, focusing on mutual information as the objective function, the

chapter establishes the following principles:

• The Infomax principle, which maximizes the mutual information between the

input and output data of a neural system; Infomax is closely related to redun-

dancy reduction.

• The Imax principle, which maximizes the mutual information between the sin-

gle outputs of a pair of neural systems that are driven by correlated inputs.

• The Imin principle operates in a manner similar to the Imax principle, except

that the mutual information between the pair of output random variables is

minimized.

• The independent-components analysis (ICA) principle, which provides a power-

ful tool for the blind separation of a hidden set of statistically independent

source signals. Provided that certain operating conditions are satisfied, the ICA

principle affords the basis for deriving procedures for recovering the original

source signals from a corresponding set of observables that are linearly mixed

versions of the source signals. Two specific ICA algorithms are described:

(i) the natural-gradient learning algorithm, which, except for scaling and per-

mutation, solves the ICA problem by minimizing the KLD between a pa-

rameterized probability density function and the corresponding factorial

distribution;

(ii) the maximum-entropy learning algorithm,which maximizes the entropy of

a nonlinearly transformed version of the demixer output; this algorithm,

commonly known as the Infomax algorithm for ICA, also exhibits scaling

and permutation properties.

Chapter 10 also describes another important ICA algorithm, known as FastICA,

which, as the name implies, is computationally fast.This algorithm maximizes a con-

trast function based on the concept of negentropy, which provides a measure of the

non-Gaussianity of a random variable. Continuing with ICA, the chapter goes on

to describe a new algorithm known as coherent ICA, the development of which

rests on fusion of the Infomax and Imax principles via the use of the copula; coherent

ICA is useful for extracting the envelopes of a mixture of amplitude-modulated

signals. Finally, Chapter 10 introduces another concept rooted in Shannon’s infor-

mation theory, namely, rate distortion theory, which is used to develop the last con-

cept in the chapter: information bottleneck. Given the joint distribution of an input

vector and a (relevant) output vector, the method is formulated as a constrained

xvi Preface

optimization problem in such a way that a tradeoff is created between two amounts

of information, one pertaining to information contained in the bottleneck vector

about the input and the other pertaining to information contained in the bottle-

neck vector about the output.The chapter then goes on to find an optimal manifold

for data representation, using the information bottleneck method.

The final approach to unsupervised learning is described in Chapter 11, using

stochastic methods that are rooted in statistical mechanics; the study of statistical

mechanics is closely related to information theory.The chapter begins by review-

ing the fundamental concepts of Helmholtz free energy and entropy (in a statisti-

cal mechanics sense), followed by the description of Markov chains. The stage is

then set for describing the Metropolis algorithm for generating a Markov chain,

the transition probabilities of which converge to a unique and stable distribution.

The discussion of stochastic methods is completed by describing simulated an-

nealing for global optimization, followed by Gibbs sampling,which can be used as

a special form of the Metropolis algorithm.With all this background on statistical

mechanics at hand, the stage is set for describing the Boltzmann machine, which,

in a historical context, was the first multilayer learning machine discussed in the

literature. Unfortunately, the learning process in the Boltzmann machine is very

slow, particularly when the number of hidden neurons is large—hence the lack of

interest in its practical use.Various methods have been proposed in the literature

to overcome the limitations of the Boltzmann machine.The most successful inno-

vation to date is the deep belief net, which distinguishes itself in the clever way in

which the following two functions are combined into a powerful machine:

• generative modeling, resulting from bottom-up learning on a layer-by-layer

basis and without supervision;

• inference, resulting from top-down learning.

Finally, Chapter 10 describes deterministic annealing to overcome the excessive

computational requirements of simulated annealing; the only problem with

deterministic annealing is that it could get trapped in a local minimum.

5. Up to this point, the focus of attention in the book has been the formulation of al-

gorithms for supervised learning,semisupervised learning, and unsupervised learn-

ing. Chapter 12, constituting the next part of the book all by itself, addresses

reinforcement learning, in which learning takes place in an on-line manner as the

result of an agent (e.g.,robot) interacting with its surrounding environment. In re-

ality, however, dynamic programming lies at the core of reinforcement learning.

Accordingly, the early part of Chapter 15 is devoted to an introductory treatment

of Bellman’s dynamic programming, which is then followed by showing that the two

widely used methods of reinforcement learning: Temporal difference (TD) learn-

ing,and Q-learning can be derived as special cases of dynamic programming. Both

TD-learning and Q-learning are relatively simple, on-line reinforcement learning

algorithms that do not require knowledge of transition probabilities. However,

their practical applications are limited to situations in which the dimensionality

of the state space is of moderate size. In large-scale dynamic systems, the curse

of dimensionality becomes a serious issue,making not only dynamic programming,

Preface xvii

but also its approximate forms, TD-learning and Q-learning, computationally in-

tractable. To overcome this serious limitation, two indirect methods of approxi-

mate dynamic programming are described:

• a linear method called the least-squares policy evaluation (LSPV) algorithm, and

• a nonlinear method using a neural network (e.g., multilayer perceptron) as a

universal approximator.

6. The last part of the book, consisting of Chapters 13, 14, and 15, is devoted to

the study of nonlinear feedback systems, with an emphasis on recurrent neural

networks:

(i) Chapter 13 studies neurodynamics, with particular attention given to the sta-

bility problem. In this context, the direct method of Lyapunov is described.

This method embodies two theorems, one dealing with stability of the system

and the other dealing with asymptotic stability. At the heart of the method

is a Lyapunov function, for which an energy function is usually found to be

adequate. With this background theory at hand, two kinds of associative

memory are described:

• the Hopfield model, the operation of which demonstrates that a complex

system is capable of generating simple emergent behavior;

• the brain-state-in-a-box model, which provides a basis for clustering.

The chapter also discusses properties of chaotic processes and a regularized

procedure for their dynamic reconstruction.

(ii) Chapter 14 is devoted to the Bayesian filter, which provides a unifying basis

for sequential state estimation algorithms, at least in a conceptual sense.The

findings of the chapter are summarized as follows:

• The classic Kalman filter for a linear Gaussian environment is derived with

the use of the minimum mean-square-error criterion; in a problem at the

end of the chapter, it is shown that the Kalman filter so derived is a spe-

cial case of the Bayesian filter;

• square-root filtering is used to overcome the divergence phenomenon that

can arise in practical applications of the Kalman filter;

• the extended Kalman filter (EKF) is used to deal with dynamic systems

whose nonlinearity is of a mild sort; the Gaussian assumption is

maintained;

• the direct approximate form of the Bayesian filter is exemplified by a new

filter called the cubature Kalman filter (CKF); here again, the Gaussian as-

sumption is maintained;

• indirect approximate forms of the Bayesian filter are exemplified by par-

ticle filters, the implementation of which can accommodate nonlinearity as

well as non-Gaussianity.

With the essence of Kalman filtering being that of a predictor–corrector,

Chapter 14 goes on to describe the possible role of “Kalman-like filtering”

in certain parts of the human brain.

The final chapter of the book, Chapter 15, studies dynamically driven recur-

rent neural networks. The early part of the chapter discusses different structures

(models) for recurrent networks and their computing power, followed by two al-

gorithms for the training of recurrent networks:

• back propagation through time, and

• real-time recurrent learning.

Unfortunately both of these procedures, being gradient based, are likely to suffer

from the so-called vanishing-gradients problem.To mitigate the problem, the use

of nonlinear sequential state estimators is described at some length for the super-

vised training of recurrent networks in a rather novel manner. In this context, the

advantages and disadvantages of the extended Kalman filter (simple, but deriva-

tive dependent) and the cubature Kalman filter (derivative free, but more com-

plicated mathematically) as sequential state estimator for supervised learning are

discussed.The emergence of adaptive behavior, unique to recurrent networks, and

the potential benefit of using an adaptive critic to further enhance the capability

of recurrent networks are also discussed in the chapter.

An important topic featuring prominently in different parts of the book is

supervised learning and semisupervised learning applied to large-scale problems. The

concluding remarks of the book assert that this topic is in its early stages of development;

most importantly, a four-stage procedure is described for its future development.

Distinct Features of the Book

Over and above the broad scope and thorough treatment of the topics summarized under

the organization of the book, distinctive features of the text include the following:

1. Chapters 1 through 7 and Chapter 10 include computer experiments involving the

double-moon configuration for generating data for the purpose of binary classifi-

cation.The experiments range from the simple case of linearly separable patterns

to difficult cases of nonseparable patterns. The double-moon configuration, as a

running example, is used all the way from Chapter 1 to Chapter 7, followed by

Chapter 10, thereby providing an experimental means for studying and compar-

ing the learning algorithms described in those eight chapters.

2. Computer experiments are also included in Chapter 8 on PCA,Chapter 9 on SOM

and kernel SOM, and Chapter 14 on dynamic reconstruction of the Mackay–Glass

attractor using the EKF and CKF algorithms.

3. Several case studies, using real-life data, are presented:

• Chapter 7 discusses the United States Postal Service (USPS) data for semisu-

pervised learning using the Laplacian RLS algorithm;

• Chapter 8 examines how PCA is applied to handwritten digital data and de-

scribes the coding and denoising of images;

• Chapter 10 treats the analysis of natural images by using sparse-sensory coding

and ICA;

• Chapter 13 presents dynamic reconstruction applied to the Lorenz attractor by

using a regularized RBF network.

xviii Preface

剩余936页未读，继续阅读

打了小松鼠

粉丝: 0
资源: 4

深度学习入门：神经网络与学习机器

BP神经网络原理介绍+示例代码.rar

《人工神经网络原理与应用》试题.doc

nar神经网络原理介绍

卷积神经网络原理介绍

卷积神经网络 tensorflow_神经网络原理演示动画合集

BP 神经网络算法原理介绍

模糊rbf神经网络原理

2 相关理论概述 2.1 糖尿病介绍与预测 2.2 人工神经网络算法 2.2.1 人工神经网络算法原理 2.2.2 人工神经网络常用算法 2.2.3 人工神经网络算法常用评价指标 2.3 本章小结

卷积神经网络理论基础 2.1 神经网络概述 2.2 卷积神经网络的基本原理 2.3 经典卷积神经网络模型介绍 2.4 卷积神经网络在图像识别中的应用

bp神经网络原理公式推导

分别阐述全连接神经网络，卷积神经网络，循环神经网络的原理

图神经网络原理与框图

rbf神经网络原理1000字

循环神经网络原理推导

matlab神经网络原理与实例详解

如何向硕士研究生讲述神经网络原理？

遗传算法优化bp神经网络原理

卷积神经网络的原理介绍及实现过程

介绍神经网络的控制算法原理

matlab神经网络原理与实例精解代码

最新资源