PCANet：基于PCA、二值哈希与块直方图的简单深度学习图像分类基础架构

需积分: 35 58 浏览量更新于2024-07-21 收藏 4.25MB PDF 举报

本文档深入探讨了"PCANet：一种基于简单基础数据处理组件的深度学习图像分类方法"。由Tsung-Han Chan（IEEE会员）、Kui Jia、Shenghua Gao、Jiwen Lu（IEEE高级会员）、Zinan Zeng和Yi Ma（IEEE院士）共同提出的PCANet是一种创新的深度学习网络架构，它在图像分类任务中展现出高效且易于设计和学习的特点。 PCANet的核心理念在于利用三个基本的数据处理步骤：1）分层主成分分析（Cascaded Principal Component Analysis, PCA），它被用来构建多级滤波器银行，用于捕捉图像中的主要特征；2）二进制哈希，作为一种有效的索引和池化技术，将复杂的数据结构简化；3）块级直方图，进一步提取局部特征并进行统计汇总。这种结合了降维、量化和特征聚合的策略使得PCANet在保持模型简洁的同时，实现了良好的性能。为了对比和更深入理解PCANet，文中还介绍了两种变体网络：1）RandNet和2）LDANet。它们在架构上与PCANet保持一致，但在滤波器的生成方式上有所区别。RandNet的滤波器是随机选择的，而LDANet则利用线性判别分析（Linear Discriminant Analysis, LDA）来学习滤波器。通过这些变体的研究，作者旨在探索不同策略对性能的影响，并揭示了PCANet设计原则的灵活性。作者们在多个视觉基准数据集上进行了广泛测试，验证了PCANet及其变体的实用性和有效性。这些实验结果不仅展示了PCANet在图像分类任务上的优越性能，也为后续研究者提供了关于如何利用简单组件构建深度学习模型的重要参考。整体来看，这篇文章对于理解深度学习基础架构在图像处理领域的应用具有很高的价值，特别是对于那些关注效率和模型简洁性的研究人员和工程师来说。

5020 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 12, DECEMBER 2015

3) Output Stage (Hashing and Histograms): Each of the

input images I

for the second stage has L

real-valued

outputs {I

∗ W



}

=1

from the second stage. We binarize these

outputs and obtain {H(I

∗W



)}

=1

,whereH(·) is a Heaviside

step (like) function, whose value is one for positive entries and

zero otherwise.

Around each pixel, we view the vector of L

binary bits as

a decimal number. This converts the L

outputs in O

back

into a single integer-valued “image”:



=1

−1

H(I

∗ W



), (8)

whose every pixel is an integer in the range



0, 2

− 1



.The

order and weights of the L

outputs are irrelevant because

here we treat each integer as a distinct “word.”

Each of the L

images T

, l = 1,...,L

is partitioned into

B blocks. We compute the histogram (with 2

bins) of the

decimal values in each block and concatenate all B histograms

into one vector and denote this vector as Bhist(T

). After this

encoding process, the “feature” of the input image I

is then

deﬁned to be the set of block-wise histograms, i.e.,

=[Bhist(T

),...,Bhist(T

)]

∈ R

. (9)

The local blocks can be either overlapping or non-overlapping,

depending on the application. Our empirical experience sug-

gests that non-overlapping blocks are suitable for face images,

whereas overlapping blocks are appropriate for hand-written

digits, textures, and object images. Furthermore, the histogram

offers some degree of translation invariance in the extracted

features, as in hand-crafted features (e.g., scale-invariant fea-

ture transform (SIFT) [12] and histogram of oriented gradi-

ents (HOG) [13]), learned features (e.g., bag-of-word (BoW)

model [14]), and the average and maximum pooling process

in ConvNet [3]–[5], [8], [9].

The hyper-parameters of the PCANet include the ﬁlter

size k

, k

, the number of ﬁlters in each stage L

, L

,the

number of stages, and the block size for local histograms in

the output layer. PCA ﬁlter banks require that k

≥ L

, L

In our experiments in Section III and Section IV, excluding

object recognition, we always set L

= L

= 8, which is

inspired from the common setting of Gabor ﬁlters [15] with

8 orientations, although some ﬁne-tuned L

, L

could lead to

marginal performance improvements. The hyper-parameters,

such as the ﬁlter size k

, k

and the block size for local

histograms, are determined through a grid search with either

cross-validation or a validation set. Moreover, we have empir-

ically observed that two-stage PCANet is in general sufﬁcient

to achieve good performance and that a deeper architecture

does not necessarily lead to further improvements. In addition,

a larger block size for local histograms provides greater

translation invariance in the extracted feature f

4) Comparison With ConvNet and ScatNet: Clearly,

PCANet shares various similarities with ConvNet [5]. The

patch-mean removal in PCANet is reminiscent of local contrast

normalization in ConvNet.

This operation moves all of the

We have tested the PCANet without patch-mean removal, and a slightly

degraded performance is observed.

patches to be centered around the origin of the vector space so

that the learned PCA ﬁlters can better capture major variations

in the data. In addition, PCA can be viewed as the simplest

class of auto-encoders, which minimizes reconstruction error.

The PCANet contains no non-linearity processes between/in

stages, in contrast to the common wisdom regarding building

deep learning networks, e.g., the absolute rectiﬁcation layer

in ConvNet [5] and the modulus layer in ScatNet [6], [10].

We have tested the PCANet with an absolute rectiﬁcation layer

added immediately after the ﬁrst stage, but we did not observe

any improvement in the ﬁnal classiﬁcation results. This could

be because the use of quantization plus a local histogram

(in the output layer) already introduces sufﬁcient invariance

and robustness in the ﬁnal feature.

The overall process prior to the output layer in the PCANet

is completely linear. One may wonder what would occur

if we merge the two stages into only one stage that has

an equivalently equal number of PCA ﬁlters and receptive

ﬁeld size. Speciﬁcally, one may be interested in how the

single-stage PCANet with L

ﬁlters of size (2k

− 1) ×

(2k

− 1) could perform compared to the two-stage PCANet

described in Section II-A. We have experimented with such

settings on faces and hand-written digits and observed that the

two-stage PCANet outperforms this single-stage alternative in

most cases; see the last several rows of Tables III, X, and XI.

In comparison to the ﬁlters learned by the single-stage alter-

native, the resulting two-stage PCA ﬁlters essentially have a

low-rank factorization, possibly resulting in a lower chance

of over-ﬁtting the dataset. Regarding why we need the deep

structure, from a computational perspective, the single-stage

alternative requires learning ﬁlters with L

(2k

−1)(2k

−1)

variables, whereas the two-stage PCANet only learns

ﬁlters with in total (L

+ L

variables. Another beneﬁt

of the two-stage PCANet is that the larger receptive ﬁeld,

because it contains more holistic observations of the objects

in images, and its learning invariance can essentially cap-

ture more semantic information. Our comparative experiments

verify that hierarchical architectures with large receptive ﬁelds

and multiple stacked stages are more efﬁcient in terms of

learning semantically related representations, which agrees

with what has been observed in [7].

B. Computational Complexity

The components for constructing the PCANet are extremely

basic and computationally efﬁcient. To observe how low

the computational complexity of PCANet would be, let us

take the two-stage PCANet as an example. In each stage

of the PCANet, forming the patch-mean-removed matrix X

costs k

+ k

˜m ˜n ﬂops; the inner product XX

has

a complexity of 2(k

)

˜m ˜n ﬂops; and the complexity of

eigen-decomposition is O((k

)

). The PCA ﬁlter convolu-

tion requires L

mn ﬂops for stage i. In the output layer,

the conversion of L

binary bits to a decimal number costs

˜m ˜n, and the naive histogram operation is of complexity

O(mnBL

log 2).By ˜m = m −k

/2, ˜n = n −k

/2 and

assuming mn  max(k

, k

, L

, B), the overall complex-

ity of the PCANet is easily veriﬁed as

O(mnk

+ L

) + mn(k

)

剩余15页未读，继续阅读

山在岭就在

粉丝: 5530
资源: 34

PCANet：基于PCA、二值哈希与块直方图的简单深度学习图像分类基础架构

给我推荐20个比较流行的点云 3D 分割模型

Mnist数据集，用于人工智能相关基础模型的学习及编程练习

水空两用无人机动力系统设计与研究.pdf

大神asp新闻发布系统毕业课程设计项目+论文

矫正图像带的旋转角度信息和目标检测标签坐标也随之改变.zip

技术方案资料技术方案资料PCB设计高级教程资料.zip

8051Proteus仿真c源码用定时器T0的中断实现长时间定时

基于Java Swing框架的知识付费管理系统.zip

（全新整理）深圳市2023年房价数据

git教程：Git使用.one礼包集合

最新资源