深度学习：原理与应用

需积分: 7 114 浏览量更新于2024-07-20 收藏 8.64MB PDF 举报

"深度学习是机器学习的一种形式，它使计算机能够从经验中学习并以概念层次的理解世界。这本书由三位领域的专家编写，是深度学习的全面指南，涵盖了线性代数、概率论、信息论、数值计算和机器学习等相关概念背景。书中详细介绍了深度学习技术，包括深前馈网络、正则化、优化算法、卷积网络、序列建模以及实际方法，并讨论了自然语言处理、语音识别、计算机视觉、在线推荐系统、生物信息学和电子游戏等应用。此外，还探讨了线性因子模型、自动编码器、表示学习、结构化概率模型、蒙特卡洛方法、分区函数、近似推理和深度生成模型等理论主题。该书适用于希望在工业或研究领域从事深度学习的本科生或研究生，以及想在产品或平台上应用深度学习的软件工程师。" 深度学习是当前人工智能领域的重要分支，它通过构建多层神经网络来模拟人脑的学习过程。本书首先介绍了深度学习的基本定义和背景，然后提供了相关的数学和概念基础，如线性代数中的矩阵运算，概率论中的概率分布，信息论中的熵和互信息，以及数值计算中的梯度下降等优化算法。这些基础知识对于理解和实现深度学习模型至关重要。接着，书中详细阐述了三种类型的深度学习网络：无监督或生成式学习的深层网络、有监督学习的深层网络以及混合型网络。无监督学习通常涉及深度自编码器，如用于语音特征提取的深度自编码器和去噪自编码器，以及变换自编码器，它们可以用于数据降维和特征学习。有监督学习则涵盖深度前馈网络和卷积网络，常应用于图像分类和识别任务。混合型网络结合了监督和无监督学习，具有更广泛的适用性。预训练的深度神经网络是深度学习中的一个重要策略，它可以利用大量未标记数据初始化网络权重，然后再用标记数据进行微调，从而提高模型的性能。此外，书中还探讨了深度学习在实际应用中的最佳实践，如自然语言处理中的语言模型，语音识别中的序列建模，以及推荐系统中的协同过滤算法。这本书不仅提供了深度学习的实践经验，还深入到理论层面，帮助读者理解深度学习背后的原理和机制。无论是对深度学习感兴趣的初学者，还是已经在行业内工作的专业人士，都能从中受益匪浅。

207

The optimization diﬃculty associated with the deep models was

empirically alleviated when a reasonably eﬃcient, unsupervised learn-

ing algorithm was introduced in the two seminar papers [163, 164].

In these papers, a class of deep generative models, called deep belief

network (DBN), was introduced. A DBN is composed of a stack of

restricted Boltzmann machines (RBMs). A core component of the

DBN is a greedy, layer-by-layer learning algorithm which optimizes

DBN weights at time complexity linear to the size and depth of the

networks. Separately and with some surprise, initializing the weights

of an MLP with a correspondingly conﬁgured DBN often produces

much better results than that with the random weights. As such,

MLPs with many hidden layers, or deep neural networks (DNN),

which are learned with unsupervised DBN pre-training followed by

back-propagation ﬁne-tuning is sometimes also called DBNs in the

literature [67, 260, 258]. More recently, researchers have been more

careful in distinguishing DNNs from DBNs [68, 161], and when DBN

is used to initialize the training of a DNN, the resulting network is

sometimes called the DBN–DNN [161].

Independently of the RBM development, in 2006 two alternative,

non-probabilistic, non-generative, unsupervised deep models were pub-

lished. One is an autoencoder variant with greedy layer-wise training

much like the DBN training [28]. Another is an energy-based model

with unsupervised learning of sparse over-complete representations

[297]. They both can be eﬀectively used to pre-train a deep neural

network, much like the DBN.

In addition to the supply of good initialization points, the DBN

comes with other attractive properties. First, the learning algorithm

makes eﬀective use of unlabeled data. Second, it can be interpreted

as a probabilistic generative model. Third, the over-ﬁtting problem,

which is often observed in the models with millions of parameters such

as DBNs, and the under-ﬁtting problem, which occurs often in deep

networks, can be eﬀectively alleviated by the generative pre-training

step. An insightful analysis on what kinds of speech information DBNs

can capture is provided in [259].

Using hidden layers with many neurons in a DNN signiﬁcantly

improves the modeling power of the DNN and creates many closely

208 Some Historical Context of Deep Learning

optimal conﬁgurations. Even if parameter learning is trapped into a

local optimum, the resulting DNN can still perform quite well since

the chance of having a poor local optimum is lower than when a small

number of neurons are used in the network. Using deep and wide neu-

ral networks, however, would cast great demand to the computational

power during the training process and this is one of the reasons why it

is not until recent years that researchers have started exploring both

deep and wide neural networks in a serious manner.

Better learning algorithms and diﬀerent nonlinearities also con-

tributed to the success of DNNs. Stochastic gradient descend (SGD)

algorithms are the most eﬃcient algorithm when the training set is large

and redundant as is the case for most applications [39]. Recently, SGD is

shown to be eﬀective for parallelizing over many machines with an asyn-

chronous mode [69] or over multiple GPUs through pipelined BP [49].

Further, SGD can often allow the training to jump out of local optima

due to the noisy gradients estimated from a single or a small batch of

samples. Other learning algorithms such as Hessian free [195, 238] or

Krylov subspace methods [378] have shown a similar ability.

For the highly non-convex optimization problem of DNN learn-

ing, it is obvious that better parameter initialization techniques will

lead to better models since optimization starts from these initial mod-

els. What was not obvious, however, is how to eﬃciently and eﬀec-

tively initialize DNN parameters and how the use of large amounts of

training data can alleviate the learning problem until more recently

[28, 20, 100, 64, 68, 163, 164, 161, 323, 376, 414]. The DNN parameter

initialization technique that attracted the most attention is the unsu-

pervised pretraining technique proposed in [163, 164] discussed earlier.

The DBN pretraining procedure is not the only one that allows

eﬀective initialization of DNNs. An alternative unsupervised approach

that performs equally well is to pretrain DNNs layer by layer by con-

sidering each pair of layers as a de-noising autoencoder regularized by

setting a random subset of the input nodes to zero [20, 376]. Another

alternative is to use contractive autoencoders for the same purpose by

favoring representations that are more robust to the input variations,

i.e., penalizing the gradient of the activities of the hidden units with

respect to the inputs [303]. Further, Ranzato et al. [294] developed the

209

sparse encoding symmetric machine (SESM), which has a very similar

architecture to RBMs as building blocks of a DBN. The SESM may also

be used to eﬀectively initialize the DNN training. In addition to unsu-

pervised pretraining using greedy layer-wise procedures [28, 164, 295],

the supervised pretraining, or sometimes called discriminative pretrain-

ing, has also been shown to be eﬀective [28, 161, 324, 432] and in cases

where labeled training data are abundant performs better than the

unsupervised pretraining techniques. The idea of the discriminative

pretraining is to start from a one-hidden-layer MLP trained with the

BP algorithm. Every time when we want to add a new hidden layer we

replace the output layer with a randomly initialized new hidden and

output layer and train the whole new MLP (or DNN) using the BP

algorithm. Diﬀerent from the unsupervised pretraining techniques, the

discriminative pretraining technique requires labels.

Researchers who apply deep learning to speech and vision analyzed

what DNNs capture in speech and images. For example, [259] applied

a dimensionality reduction method to visualize the relationship among

the feature vectors learned by the DNN. They found that the DNN’s

hidden activity vectors preserve the similarity structure of the feature

vectors at multiple scales, and that this is especially true for the ﬁl-

terbank features. A more elaborated visualization method, based on

a top-down generative process in the reverse direction of the classi-

ﬁcation network, was recently developed by Zeiler and Fergus [436]

for examining what features the deep convolutional networks capture

from the image data. The power of the deep networks is shown to

be their ability to extract appropriate features and do discrimination

jointly [210].

As another way to concisely introduce the DNN, we can review the

history of artiﬁcial neural networks using a “hype cycle,” which is a

graphic representation of the maturity, adoption and social applica-

tion of speciﬁc technologies. The 2012 version of the hype cycles graph

compiled by Gartner is shown in Figure 2.1. It intends to show how

a technology or application will evolve over time (according to ﬁve

phases: technology trigger, peak of inﬂated expectations, trough of dis-

illusionment, slope of enlightenment, and plateau of production), and

to provide a source of insight to manage its deployment.

剩余197页未读，继续阅读

lv2306lv

粉丝: 0
资源: 5

深度学习：原理与应用

deeplearning4j-nn-1.0.0-M1.1-API文档-中文版.zip

deep learning toolbox 安装

Deeplearning4j 视频教程有吗？

how to learn deep learning

下载deep learning toolbox

怎么安装deep learning toolbox

window安装deeplearning4j

Java 安装deeplearning4j

java deeplearning4j 安装

deep learning toolbox 下载

最新资源