稀疏表示学习：能量模型方法

需积分: 29 15 浏览量更新于2024-09-11 收藏 187KB PDF 举报

"这篇论文提出了一种新颖的无监督学习方法，用于学习稀疏、过度完整的特征。模型基于能量模型，结合线性编码器和线性解码器，以及一个稀疏化非线性函数，将编码向量转化为近似二进制的稀疏编码向量。在给定输入的情况下，最优的编码会最小化解码器输出与输入区域之间的距离，同时尽可能接近编码器的输出。学习过程采用类似EM算法的两阶段方式进行：(1)计算最小能量编码向量；(2)调整编码器和解码器的参数以降低能量。当用训练手写数字时，模型会产生‘笔画检测器’；而训练在自然图像补丁上时，会得到类似高斯-伯恩斯坦滤波器的结果。推理和学习都非常快速，不需要预处理，也不需要昂贵的采样。该提出的无监督方法可以用于初始化监督学习任务的特征，提高了后续监督学习的性能。" 本文探讨的是能源基础模型（Energy-Based Model）在学习稀疏表示中的应用。稀疏表示是机器学习领域的一个关键概念，它涉及到在数据表示中寻找尽可能少的活性元素，以达到高效表示和压缩信息的目的。这种模型采用线性编码器，其作用是将原始输入转换成一个连续的向量表示，然后通过一个稀疏化非线性函数，如ReLU（Rectified Linear Unit）或其变体，进一步压缩为稀疏的近似二进制代码。线性解码器在模型中扮演着重建输入的角色，其目的是将经过稀疏化的编码向量恢复到尽可能接近原始输入的状态。这个过程类似于自编码器（Autoencoder）的工作机制，但是加入了能量函数的优化，使得模型能够更好地学习到数据的内在结构。学习算法借鉴了期望最大化（Expectation-Maximization, EM）的思想，分为两个阶段：首先，计算输入数据的最小能量编码；然后，更新编码器和解码器的参数，以减少能量，即减小解码输出与原始输入的差异。这种方法能够促进模型对数据的表示进行迭代优化，同时保持编码的稀疏性。实验结果显示，该模型在手写数字识别任务上产生了类似于笔画的检测特征，这表明它能捕捉到数字的基本构成部分。而在自然图像数据集上，模型学习到的特征类似高斯-伯恩斯坦滤波器，这些滤波器常用于检测图像中的边缘和纹理，反映了模型对于图像局部特征的敏感性。由于模型的推理和学习过程既快速又无需复杂的预处理，它为实际应用提供了便利。此外，该无监督学习方法可以作为监督学习任务的预训练步骤，提升后续监督学习模型的性能，特别是在缺乏大量标注数据的情况下。总结来说，这篇论文介绍的能源基础模型提供了一种有效的稀疏表示学习方法，通过结合线性变换和非线性稀疏化，实现了高效的特征学习，并且具有良好的适应性和应用潜力。

Efﬁcient Learning of Sparse Representations

with an Energy-Based Model

Marc’Aurelio Ranzato, Christopher Poultney, Sumit Chopra, and Yann LeCun

Courant Institute of Mathematical Sciences

New York University, New York, NY 10003

{ranzato,crispy,sumit,yann}@cs.nyu.edu

Abstract

We describe a novel unsupervised method for learning sparse, overcomplete fea-

tures. The model uses a linear encoder, and a linear decoder preceded by a spar-

sifying non-linearity that turns a code vector into a quasi-binary sparse code vec-

tor. Given an input, the optimal code minimizes the distance between the output

of the decoder and the input patch while being as similar as possible to the en-

coder output. Learning proceeds in a two-phase EM-like fashion: (1) compute

the minimum-energy code vector, (2) adjust the parameters of the encoder and de-

coder so as to decrease the energy. The model produces “stroke detectors” when

trained on handwritten numerals, and Gabor-like ﬁlters when trained on natural

image patches. Inference and learning are very fast, requiring no preprocessing,

and no expensive sampling. Using the proposed unsupervised method to initialize

the ﬁrst layer of a convolutional network, we achieved an error rate slightly lower

than the best reported result on the MNIST dataset. Finally, an extension of the

method is described to learn topographical ﬁlter maps.

1 Introduction

Unsupervised learning methods are often used to produce pre-processors and feature extractors for

image analysis systems. Popular methods such as Wavelet decomposition, PCA, Kernel-PCA, Non-

Negative Matrix Factorization [1], and ICA produce compact representations with somewhat uncor-

related (or independent) components. Most methods produce representations that either preserve

or reduce the dimensionality of the input. However, several recent works have advocated the use

of sparse-overcomplete representations for images, in which the dimension of the feature vector is

larger than the dimension of the input, but only a small number of components are non-zero for

any one image [2, 3]. Sparse-overcomplete representations present several potential advantages.

Using high-dimensional representations increases the likelihood that image categories will be easily

(possibly linearly) separable. Sparse representations can provide a simple interpretation of the input

data in terms of a small number of “parts” by extracting the structure hidden in the data. Further-

more, there is considerable evidence that biological vision uses sparse representations in early visual

areas [4, 5].

It seems reasonable to consider a representation “complete” if it is possible to reconstruct the input

from it, because the information contained in the input would need to be preserved in the represen-

tation itself. Most unsupervised learning methods for feature extraction are based on this principle,

and can be understood in terms of an encoder module followed by a decoder module. The encoder

takes the input and computes a code vector, for example a sparse and overcomplete representation.

The decoder takes the code vector given by the encoder and produces a reconstruction of the in-

put. Encoder and decoder are trained in such a way that reconstructions provided by the decoder

are as similar as possible to the actual input data, when these input data have the same statistics

as the training samples. Methods such as Vector Quantization, PCA, auto-encoders [6], Restricted

Boltzmann Machines [7], and others [8] have exactly this architecture but with different constraints

on the code and learning algorithms, and different kinds of encoder and decoder architectures. In

下载后可阅读完整内容，剩余7页未读，立即下载

honghf123

粉丝: 0
资源: 10

稀疏表示学习：能量模型方法

netlogo_netlogo_eat_agent-basedmodel_beginningm67_源码

DAT的源代码

A graph-based generic type system for object-oriented programs

【重磅，更新！】2002-2021年中国31省份经济韧性测度三级指标数据合集（各省、市、企业等）

CPPC++_更好的Windows字体渲染.zip

10018.doc

在Windows capa中轻松创建虚拟显示.zip

二环北路东段欣心家园小区商业B段(中石油加油站东邻).m4a

CPPC++_自主无人机的开源软件.zip

C2005.doc

最新资源