强化去噪自编码器的新训练策略提升图像识别性能

35 浏览量更新于2024-08-26 收藏 1.28MB PDF 举报

本文主要探讨了堆叠去噪自动编码器（Stacked Denoising Autoencoders, SDAE）的新训练原理在无监督学习中的应用。传统的无监督学习方法通常依赖于完全未标记的数据来学习特征表示，但这种方法可能使学到的特征对噪声敏感且不够具有代表性。作者提出了一种创新的训练策略，即使用部分被破坏或“去噪”的输入数据，这使得SDAE能够在处理噪声方面更具鲁棒性，从而能够捕捉到输入数据的更深层次和更具概括性的模式。在新的训练原则下，SDAE不再仅仅是一个简单的数据压缩器，而是通过逐层处理和恢复部分损坏的信息，逐渐构建出对原始数据更忠实的重构。这种“去噪”过程实际上可以被视为一种形式的预训练，有助于提升网络的底层表示学习能力。当多个SDAE层组合在一起构成深度网络时，整个架构能够形成强大的特征提取器，适用于诸如图像分类等任务。论文特别关注了如何将这个训练框架整合到一个涉及监督学习方法的体系中，以优化图像分类的性能。作者提供了详细的方法论，包括如何在每一层SDAE训练完成后，利用监督信息来调整顶层网络的权重，以提高模型的精度。实验结果表明，与传统方法相比，新提出的训练原则显著提高了模型对抗训练样本噪声的能力，并在MNIST数据库上取得了更好的图像分类准确率。关键词：无监督学习、堆叠去噪自动编码器、图像分类。这项研究不仅深化了我们对SDAE的理解，还为实际应用中的深度学习模型提供了一种有效的训练策略，特别是在处理高噪声数据集时，其潜在优势不容忽视。通过这种方式，模型不仅学习到了更有意义的特征，而且在实际任务中的表现也得到了显著提升。

A New Training Principle for Stacked Denoising Autoencoders*

Qianhaozhe You

Department of Electronic Engineering

Tsinghua University

Beijing 100084, P.R.China

haozhe.yqhz@gmail.com

Yu-jin Zhang

Department of Electronic Engineering

Tsinghua University

Beijing 100084, P.R.China

zhang-yj@tsinghua.edu.cn

Abstract — In this work, a new training principle is introduced for

unsupervised learning that makes the learned representations

more efficient and useful. Using partially corrupted inputs instead,

the denoising Autoencoder can obtain more robust and

representative pattern of inputs than the traditional learning

methods. Besides, this denoising Autoencoder can be stacked to

form a deep network. The whole framework of training stacked

denoising Autoencoders, which involved several supervised

training methods in the framework, is given for image

classification. Comparative experiments have shown that the

model can resist noise of training examples powerfully and achieve

better accuracy of image classification on MNIST database.

Keywords-unsupervised learning; stacked denoising Auto-

encoders; image classification

I. INTRODUCTION

As one of the significant tasks in pattern recognition,

image classification technique has been widely developed in

recent years. The most widely-used framework was a kind

of discriminative model [3]. Hard-assignment vector

quantization [4] was the most popular method at that time

and it was extended to compensate the loss of spatial

information with spatial pyramid matching [3]. [5] adopted

the sparse coding algorithms for dictionary learning step in

the image classification framework. With the occurrence of

sparse coding, many extended methods, such as laplacian

sparse coding [6], kernel sparse representation [7] and

sparse coding with manifold projections [8], were proposed

and achieved better performance for image classification.

All these existing recognition approach extracted hand-

designed features which required time-consuming hand-

tuning. With the development of Deep Learning methods

[11], features could be learned by the machine itself during

the training process, instead being calculated by some

existing rules. Besides, the concept of deep architecture of

features was proposed to imitate the visual motions of

human brain. The Stacked Denoising Autoencoders [10] in

this paper, which is one of the deep neural networks, aimed

at extracting hierarchical features from images when

training and tuning the networks.

This work was partially supported by National Nature Science Foundation

(NNSF: 61171118) and Specialized Research Fund for the Doctoral

Program of Higher Education (SRFDP-20110002110057).

For visual recognition and pattern classification tasks, it

is difficult to learn either deep generative models or

discriminative models. Previous work has found the way to

overcome the difficulties by an advanced unsupervised

learning step which transforms input data to some related

intermediate representations in different space. In this paper,

an unsupervised pretraining step is introduced to initially

construct the model before supervised optimizing step.

Besides, training the model with elastic distortions has

shown surprisingly better performance for image

classification.

The rest of the paper is organized as follows. Section 2

introduces the related theory and models of the classical

Autoencoder, the denoising Autoencoder and the stacked

denoising Autoencoders. The implementation of the models

and some major algorithms are given in Section 3. Then,

analysis and discussions of the experimental results are

shown in Section 4. Finally, conclusions are drawn in

Section 5.

II. M

ODELS

In this section, the models used in this paper are defined

to form a deep network for image classification. Firstly, the

basic model, classical Autoencoder is introduced in detail.

With corrupted input instead, classical Autoencoder can be

improved and extended to denoising Autoencoder. Then,

several denoising Autoencoders are stacked layer by layer to

construct a deep network, which is called stacked denoising

Autoencoders.

A. The Classical Autoencoder

The classical Autoencoder takes a non-linear mapping

from the visible input x  [0, 1]

to a hidden representation

y  [0, 1]

. Commonly, the sigmoid function is used to be

the deterministic mapping as follows:

() ( )yfx sWxb

(1)

The transformation above can be regarded as an encoding

step. Then, the hidden representation y is mapped back to

the input space through a similar transformation:

() ( )zgy sWyb



 

(2)

This reverse transformation can be regarded as a

decoding step. Here, W' can be optionally defined as the

transpose matrix of W, while b' and b are independent. The

relationship of x, y, z and the model of classical

Autoencoder are shown in Figure 1.