深度学习架构：Yoshua Bengio的探索与优势

需积分: 12 95 浏览量更新于2024-07-16 收藏 1.35MB PDF 举报

《学习AI的深层架构》（Learning Deep Architectures for AI, Yoshua Bengio）是一本专著，收录于《机器学习研究进展》（Foundations and Trends in Machine Learning）第二卷第一期（2009年）。该书由Yoshua Bengio撰写，重点关注深度架构在人工智能中的学习算法及其理论优势。以下是文章的主要内容概述： 1. **介绍**： - 作者首先探讨了训练深层架构的方法，强调了为何需要深架构以及它们在解决复杂问题时的潜力。 - 中间层表示的重要性被提及，即共享特征和抽象概念，使得模型能够在多个任务之间迁移学习。 2. **理论优势**： - 深度架构在计算复杂性上的优势：相比于浅层网络，深度模型能够处理更复杂的函数，因为它们可以利用更多的参数来表示高级抽象。 - 非正式的论证表明，深层模型在泛化能力上可能超过浅层模型，因为它们能够捕捉到数据的全局结构。 3. **局部与非局部泛化**： - 作者分析了仅依赖局部模式匹配的局限性，指出深层模型能通过学习分布式表示进行更广泛的关联。 - 学习分布式表示允许模型从全局角度理解输入，提高了模型的适应性和泛化性能。 4. **神经网络应用**： - 多层神经网络作为深度架构的基础，包括前馈、反馈和递归网络。 - 训练深层神经网络的挑战，如梯度消失或爆炸问题，以及防止过拟合的方法。 - 无监督学习方法，如自编码器（Auto-Encoders），用于预训练和特征学习。 - 深层生成模型，如生成对抗网络（GANs）和变分自编码器（VAEs），在模型构建和数据建模中的作用。 - 卷积神经网络（CNN）在图像处理中的广泛应用，展示了它们对局部结构的敏感性。 5. **能量基模型和玻尔兹曼机**： - 作者介绍了能量模型的概念，这些模型基于概率分布，如产品专家模型（Product of Experts）。 - 玻尔兹曼机作为能量模型的一种，其工作原理和受限玻尔兹曼机（RBM）的学习算法——对比退火和 Contrastive Divergence（CD）算法。 - 这些模型在深度学习中的重要性在于它们提供了潜在空间的学习方式，有助于生成和理解复杂的数据分布。通过深入探讨这些主题，Bengio在这本书中不仅阐述了深度学习算法的动机和原理，还提供了关于如何设计和训练深度架构以实现人工智能的实用指导。阅读这本书对于理解现代深度学习技术背后的理论基础和技术挑战具有重要意义。

Theoretical Advantages of Deep Architectures

In this section, we present a motivating argument for the study of

learning algorithms for deep architectures, by way of theoretical results

revealing potential limitations of architectures with insuﬃcient depth.

This part of the monograph (this section and the next) motivates the

algorithms described in the later sections, and can be skipped without

making the remainder diﬃcult to follow.

The main point of this section is that some functions cannot be eﬃ-

ciently represented (in terms of number of tunable elements) by archi-

tectures that are too shallow. These results suggest that it would be

worthwhile to explore learning algorithms for deep architectures, which

might be able to represent some functions otherwise not eﬃciently rep-

resentable. Where simpler and shallower architectures fail to eﬃciently

represent (and hence to learn) a task of interest, we can hope for learn-

ing algorithms that could set the parameters of a deep architecture for

this task.

We say that the expression of a function is compact when it has

few computational elements, i.e., few degrees of freedom that need to

be tuned by learning. So for a ﬁxed number of training examples, and

short of other sources of knowledge injected in the learning algorithm,

14 Theoretical Advantages of Deep Architectures

we would expect that compact representations of the target function

would yield better generalization.

More precisely, functions that can be compactly represented by a

depth k architecture might require an exponential number of computa-

tional elements to be represented by a depth k − 1 architecture. Since

the number of computational elements one can aﬀord depends on the

number of training examples available to tune or select them, the con-

sequences are not only computational but also statistical: po or general-

ization may be expected when using an insuﬃciently deep architecture

for representing some functions.

We consider the case of ﬁxed-dimension inputs, where the computa-

tion performed by the machine can be represented by a directed acyclic

graph where each node performs a computation that is the application

of a function on its inputs, each of which is the output of another node

in the graph or one of the external inputs to the graph. The whole

graph can be viewed as a circuit that computes a function applied to

the external inputs. When the set of functions allowed for the compu-

tation nodes is limited to logic gates, such as {AND, OR, NOT}, this

is a Boolean circuit, or logic circuit.

To formalize the notion of depth of architecture, one must introduce

the notion of a set of computational elements. An example of such a set

is the set of computations that can be performed logic gates. Another

is the set of computations that can be performed by an artiﬁcial neuron

(depending on the values of its synaptic weights). A function can be

expressed by the composition of computational elements from a given

set. It is deﬁned by a graph which formalizes this composition, with

one node per computational element. Depth of architecture refers to

the depth of that graph, i.e., the longest path from an input node to

an output node. When the set of computational elements is the set of

computations an artiﬁcial neuron can perform, depth corresponds to

the number of layers in a neural network. Let us explore the notion of

depth with examples of architectures of diﬀerent depths. Consider the

function f(x)=x ∗ sin(a ∗ x + b). It can be expressed as the composi-

tion of simple operations such as addition, subtraction, multiplication,

The target function is the function that we would like the learner to discover.

剩余129页未读，继续阅读

weixin_38744153

粉丝: 348

深度学习架构：Yoshua Bengio的探索与优势

Deep Learning: Theoretical Motivations_Yoshua Bengio

Learning Deep Architectures for AI.pdf

Learning Deep Architectures for AI

Creating_Brain-Like_Intelligence\Learning deep Architectures for AI

Learning deep architecture for AI

Practical Recommendations for Gradient-Based Training of Deep Architectures

Practical recommendations for gradient-based training of deep architectures

深度学习：构建人工智能的深层架构

深度学习架构：2009年 Bengio 的突破与动机

深度学习先驱 Bengio 论文的中文翻译版《人工智能中的深度结构学习》

最新资源