卷积深度信度网络：解决大规模图像层次表示的无监督学习

需积分: 10 68 浏览量更新于2024-09-10 收藏 1.08MB PDF 举报

本文主要探讨了"用于分层表示的可伸缩无监督学习的卷积深度信度网络"（Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations）。这是一种在大规模、高维度图像上进行无监督学习的深度生成模型，特别关注于解决将这种模型扩展到现实尺度图像时所面临的挑战。深度信念网络（Deep Belief Networks, DBNs）一直以来都是研究热点，它们试图通过层次结构来建模复杂的潜在数据分布。然而，当处理如自然图像这类大型数据集时，如何保持模型的有效性和效率成为关键问题。为此，作者提出了卷积深度信念网络（Convolutional Deep Belief Network, ConvDBN），这是对传统DBN的一个创新性拓展。 ConvDBN的主要特点是其翻译不变性，即它能够识别图像中的物体，而不受位置变化的影响。这使得模型在处理图像数据时具有更强的泛化能力。此外，它引入了一种名为概率最大池化（Probabilistic Max-Pooling）的技术，这是一个新颖的算法组件，它能在保持概率论基础的同时，有效地减小高层特征空间的大小，从而实现模型的高效计算。论文的核心实验部分展示了ConvDBN在无标签对象图片和自然场景中的表现。通过无监督学习，模型能够学到有用的高级视觉特征，比如对象的部分和场景元素。这些发现表明，ConvDBN不仅能够捕捉图像的基本结构，还能学习到抽象的、有组织的特征表示，这对于后续的计算机视觉任务，如图像分类、物体识别等具有重要意义。这篇论文提出了一种有效的无监督学习方法，通过卷积深度信念网络，实现了在大规模图像数据上的层次表示学习，显著改善了传统DBN在处理高维图像时的扩展性和性能。这项工作对于推动深度学习在图像处理领域的实际应用和理论研究具有深远影响。

Convolutional Deep Belief Networks

for Scalable Unsupervised Learning of Hierarchical Representations

Honglak Lee hllee@cs.stanford.edu

Roger Grosse rgrosse@cs.stanford.edu

Rajesh Ranganath rajeshr@cs.stanford.edu

Andrew Y. Ng ang@cs.stanford.edu

Computer Science Department, Stanford University, Stanford, CA 94305, USA

Abstract

There has b e en much interest in unsuper-

vised learning of hierarchical generative mod-

els such as deep belief networks. Scaling

such models to full-sized, high-dimensional

images remains a diﬃcult problem. To ad-

dress this problem, we present the convolu-

tional deep belief network, a hierarchical gen-

erative model which scales to realistic image

sizes. This model is translation-invariant and

supports eﬃcient bottom-up and top-down

probabilistic inference. Key to our approach

is probabilistic max-pooling,anoveltechnique

which shrinks the representations of higher

layers in a probabilistically sound way. Our

experiments show that the algorithm learns

useful high-level visual features, such as ob-

ject parts, from unlabeled images of objects

and natural scenes. We demonstrate excel-

lent performance on several visual recogni-

tion tasks and show that our model can per-

form hierarchical (bottom-up and top-down)

inference over full-sized images.

1. Introduction

The visual world can be described at many levels: pixel

intensities, e dges, object parts, objects, and beyond.

The prospect of learning hierarchical models which

simultaneously represent multiple levels has recently

generated much interest. Ideally, such “deep” repre-

sentations would learn hierarchies of feature detectors,

and further be able to combine top-down and bottom-

up processing of an image. For instance, lower layers

could support object detection by spotting low-level

features indicative of object parts. Conversely, infor-

mation about objects in the higher layers could resolve

App earing in Proceedings of the 26

International Confer-

ence on Machine Learning, Montreal, Canada, 2009. Copy-

right 2009 by the author(s)/owner(s).

lower-level ambiguities in the image or infer the loca-

tions of hidden object parts.

Deep architectures consist of feature detector units ar-

ranged in layers. Lower layers detect simple features

and feed into higher layers, which in turn detect more

complex features. There have been several approaches

to learning deep networks (LeCun et al., 1989; Bengio

et al., 2006; Ranzato et al., 2006; Hinton et al., 2006).

In particular, the deep belief network (DBN) (Hinton

et al., 2006) is a multilayer generative model where

each layer encodes statistical dependencies among the

units in the layer below it; it is trained to (approxi-

mately) maximize the likelihood of its training data.

DBNs have been successfully used to learn high-level

structure in a wide variety of domains, including hand-

written digits (Hinton et al., 2006) and human motion

capture data (Taylor et al., 2007). We build upon the

DBN in this paper because we are interested in learn-

ing a generative model of images which can be trained

in a purely unsupervised manner.

While DBNs have been successful in controlled do-

mains, scaling them to realistic-sized (e.g., 200x200

pixel) images remains challenging for two reasons.

First, images are high-dimensional, so the algorithms

must scale gracefully and be computationally tractable

even when applied to large images. Second, objects

can appear at arbitrary lo c ations in images; thus it

is desirable that representations be invariant at least

to local translations of the input. We address these

issues by incorporating translation invariance. Like

LeCun et al. (1989) and Grosse et al. (2007), we

learn feature detectors which are shared among all lo-

cations in an image, because features which capture

useful information in one part of an image can pick up

the same information elsewhere. Thus, our model can

represent large images using only a small number of

feature detectors.

This paper presents the convolutional deep belief net-

work, a hierarchical generative model that scales to

full-sized images. Another key to our approach is

下载后可阅读完整内容，剩余7页未读，立即下载

zrf223

粉丝: 0
资源: 6

卷积深度信度网络：解决大规模图像层次表示的无监督学习

基于奇异值分解和深度信度网络多分类器的滚动轴承故障诊断方法

深度学习网络（deep learning）matlab工具箱

高分影像场景分类的半监督深度卷积神经网络学习方法.pdf

基于分层卷积深度学习系统的植物叶片识别研究.pdf

一种基于前向无监督卷积神经网络的人脸表示学习方法.pdf

深度学习卷积神经网络图像参考文献

深度学习之卷积神经网络CNN用于人脸检测C++库

深度卷积神经网络

深度卷积生成对抗网络在无监督学习中的应用

深度卷积网络的弱监督与半监督学习

最新资源