深度学习：构建人工智能的深层架构

需积分: 10 53 浏览量更新于2024-07-25 收藏 1.08MB PDF 举报

"Learning Deep Architectures for AI" 是一篇由 Yoshua Bengio 撰写的关于深度学习的综述文章，发表在2009年的《机器学习中的基础与趋势》期刊上。该文深入探讨了深度学习的理论优势、局部与非局部泛化、神经网络在深度架构中的应用以及能量模型和玻尔兹曼机等相关主题。文章首先介绍了深度学习的核心概念，包括如何训练深度架构、中间表示的作用（即特征和抽象在不同任务间的共享）、构建人工智能的期望目标，以及文章的主要结构。深度学习的关键在于构建多层的复杂结构，这些结构能够逐步学习和提取数据的层次特征。理论优势部分，Bengio 讨论了深度架构在计算复杂性上的优点，并提供了非正式的论证。他指出，深度网络能够有效地处理高维输入，并通过分层学习降低复杂性。此外，他还讨论了局部与非局部泛化的区别，揭示了深度学习在模式识别和泛化能力上的潜力，特别是通过学习分布式表示来超越简单的模板匹配限制。在神经网络部分，Bengio 阐述了多层神经网络的基础，强调了训练深度神经网络所面临的挑战。他还探讨了无监督学习在构建深度架构中的作用，以及深度生成模型的概念。卷积神经网络（CNN）作为处理视觉数据的重要工具被提及，其在图像识别和处理中的高效性得到了肯定。同时，自动编码器（Auto-Encoder）作为一种有效的降维和特征学习工具也被详细介绍。文章还涵盖了基于能量的模型和玻尔兹曼机。能量模型，如专家产品，提供了一种描述复杂概率分布的方法。Boltzmann 机器是一种通用的概率图模型，而受限玻尔兹曼机（RBM）则简化了训练过程，常用于特征学习和预训练。最后，对比散度（Contrastive Divergence）作为一种近似训练 RBM 的算法也被讨论。这篇综述文章为理解深度学习的基本原理、方法和应用提供了全面的指导，对于想要深入了解这一领域的读者来说是一份宝贵的资料。

Theoretical Advantages of Deep Architectures

In this section, we present a motivating argument for the study of

learning algorithms for deep architectures, by way of theoretical results

revealing potential limitations of architectures with insuﬃcient depth.

This part of the monograph (this section and the next) motivates the

algorithms described in the later sections, and can be skipped without

making the remainder diﬃcult to follow.

The main point of this section is that some functions cannot be eﬃ-

ciently represented (in terms of number of tunable elements) by archi-

tectures that are too shallow. These results suggest that it would be

worthwhile to explore learning algorithms for deep architectures, which

might be able to represent some functions otherwise not eﬃciently rep-

resentable. Where simpler and shallower architectures fail to eﬃciently

represent (and hence to learn) a task of interest, we can hope for learn-

ing algorithms that could set the parameters of a deep architecture for

this task.

We say that the expression of a function is compact when it has

few computational elements, i.e., few degrees of freedom that need to

be tuned by learning. So for a ﬁxed number of training examples, and

short of other sources of knowledge injected in the learning algorithm,

14 Theoretical Advantages of Deep Architectures

we would expect that compact representations of the target function

would yield better generalization.

More precisely, functions that can be compactly represented by a

depth k architecture might require an exponential number of computa-

tional elements to be represented by a depth k − 1 architecture. Since

the number of computational elements one can aﬀord depends on the

number of training examples available to tune or select them, the con-

sequences are not only computational but also statistical: poor general-

ization may be expected when using an insuﬃciently deep architecture

for representing some functions.

We consider the case of ﬁxed-dimension inputs, where the computa-

tion performed by the machine can be represented by a directed acyclic

graph where each node performs a computation that is the application

of a function on its inputs, each of which is the output of another node

in the graph or one of the external inputs to the graph. The whole

graph can be viewed as a circuit that computes a function applied to

the external inputs. When the set of functions allowed for the compu-

tation nodes is limited to logic gates, such as {AND, OR, NOT}, this

is a Boolean circuit, or logic circuit.

To formalize the notion of depth of architecture, one must introduce

the notion of a set of computational elements. An example of such a set

is the set of computations that can be performed logic gates. Another

is the set of computations that can be performed by an artiﬁcial neuron

(depending on the values of its synaptic weights). A function can be

expressed by the composition of computational elements from a given

set. It is deﬁned by a graph which formalizes this composition, with

one node per computational element. Depth of architecture refers to

the depth of that graph, i.e., the longest path from an input node to

an output node. When the set of computational elements is the set of

computations an artiﬁcial neuron can perform, depth corresponds to

the number of layers in a neural network. Let us explore the notion of

depth with examples of architectures of diﬀerent depths. Consider the

function f(x)=x ∗ sin(a ∗ x + b). It can be expressed as the composi-

tion of simple operations such as addition, subtraction, multiplication,

The target function is the function that we would like the learner to discover.

2.1 Computational Complexity 17

A two-layer circuit of logic gates can represent any Boolean func-

tion [127]. Any Boolean function can be written as a sum of products

(disjunctive normal form: AND gates on the ﬁrst layer with optional

negation of inputs, and OR gate on the second layer) or a product

of sums (conjunctive normal form: OR gates on the ﬁrst layer with

optional negation of inputs, and AND gate on the second layer). To

understand the limitations of shallow architectures, the ﬁrst result to

consider is that with depth-two logical circuits, most Boolean func-

tions require an exponential (with respect to input size) number of

logic gates [198] to be represented.

More interestingly, there are functions computable with a

polynomial-size logic gates circuit of depth k that require exponential

size when restricted to depth k − 1 [62]. The proof of this theorem

relies on earlier results [208] showing that d-bit parity circuits of depth

2 have exponential size. The d-bit parity function is deﬁned as usual:

parity : (b

,...,b

) ∈{0,1}

→











1, if



i=1

is even

0, otherwise.

One might wonder whether these computational complexity results

for Boolean circuits are relevant to machine learning. See [140] for an

early survey of theoretical results in computational complexity relevant

to learning algorithms. Interestingly, many of the results for Boolean

circuits can be generalized to architectures whose computational ele-

ments are linear threshold units (also known as artiﬁcial neurons [125]),

which compute

f(x)=1



x+b≥0

(2.1)

with parameters w and b. The fan-in of a circuit is the maximum

number of inputs of a particular element. Circuits are often organized

in layers, like multi-layer neural networks, where elements in a layer

only take their input from elements in the previous layer(s), and the

ﬁrst layer is the neural network input. The size of a circuit is the number

of its computational elements (excluding input elements, which do not

perform any computation).

剩余129页未读，继续阅读

xzyxzmzm2718

粉丝: 0
资源: 1

深度学习：构建人工智能的深层架构

Cognex VisionPro DeepLearning功能与使用指南

DeepLearning4j数据集API文档中文版下载与使用指南

深度学习框架Deeplearning4j中英API文档对照版

Deep Learning

deeplearning

Deep learning

深度学习Java库deeplearning4j的1.0.0-M1.1版API中文文档

deeplearning4j深度学习框架中英对照API文档

富锂锰基正极材料行业研究报告 新能源材料技术 富锂锰基正极材料 行业分析 应用

使用 Vue.js 3.x 制作的可定制且易于使用的数据表组件.zip

最新资源

富锂锰基正极材料行业研究报告新能源材料技术富锂锰基正极材料行业分析应用