矩阵变量受限玻尔兹曼机在图像处理中的应用

184 浏览量更新于2024-08-28 收藏 318KB PDF 举报

"Matrix Variate Restricted Boltzmann Machine (MVRBM) 模型是由Guanglei Qi、Yanfeng Sun、Junbin Gao、Yongli Hu和Jinghua Li提出的一种新型生成模型，旨在处理矩阵数据，保留图像等高维数据的空间信息。与传统RBM不同，MVRBM的输入和隐藏变量都以矩阵形式存在，并通过双线性变换相互连接，具有较少的模型参数但保持与经典RBM相当的性能。在手写数字去噪、重构和识别等实际应用中显示了其优势。" Matrix Variate Restricted Boltzmann Machine (MVRBM) 是一种扩展自经典Restricted Boltzmann Machine (RBM)的新型深度学习模型，特别设计用于处理矩阵形式的数据。传统的RBM通常被用来建模向量数据，但在处理如图像这样的矩阵数据时，需要将数据矢量化，这会导致大量的维度膨胀，并丢失原有的空间结构信息。 MVRBM的核心创新在于其结构，它允许输入和隐藏层的变量都是矩阵，而非简单的向量。这种矩阵形式的变量可以更好地保留原始数据的二维或更高维结构。连接这些矩阵变量的方式是通过双线性变换，这种变换比简单的线性变换更复杂，能够捕捉数据间的交互和模式，从而增强模型的表达能力。尽管MVRBM引入了矩阵变量，但其模型参数却显著少于经典RBM。这意味着在保持相似性能的同时，MVRBM能够更有效地学习和表示数据，降低了过拟合的风险。此外，更少的参数也意味着更快的训练速度和更低的计算需求。在实际应用中，MVRBM的优势得到了验证。首先，对于手写数字的去噪任务，MVRBM能够从噪声图像中恢复清晰的数字，展示了其在处理破损或低质量数据时的强大力量。其次，在重构任务中，MVRBM能够根据部分信息重建整个图像，显示了其对数据内在结构的深刻理解。最后，MVRBM在手写数字识别上的表现也令人满意，表明其在特征提取和分类方面具有竞争力。 Matrix Variate RBM是一种强大的机器学习工具，特别是在处理保留有空间信息的矩阵数据时。它的出现为深度学习领域提供了一个新的视角，尤其是在图像处理、计算机视觉和其他依赖二维或更高维数据的领域，MVRBM有望成为一种有价值的模型选择。未来的研究可能会进一步探索其在其他领域的潜力，如自然语言处理中的句子建模或音频信号处理中的频谱分析。

Matrix Variate Restricted Boltzmann Machine

Guanglei Qi, Yanfeng Sun, Junbin Gao, Yongli Hu and Jinghua Li

Abstract—Restricted Boltzmann Machine (RBM) is an impor-

tant generative model modeling vectorial data. While applying an

RBM in practice to images, the data have to be vectorized. This

results in high-dimensional data and valuable spatial information

has got lost in vectorization. In this paper, a Matrix-Variate

Restricted Boltzmann Machine (MVRBM) model is proposed by

generalizing the classic RBM to explicitly model matrix data.

In the new RBM model, both input and hidden variables are

in matrix forms which are connected by bilinear transforms.

The MVRBM has much less model parameters while retaining

comparable performance as the classic RBM. The advantages

of the MVRBM have been demonstrated on three real-world

applications: handwritten digit denoising ,reconstruction and

recognition.

Index Terms—Machine Learning, Restricted Boltzmann Ma-

chine, Digit Recognition, Feature Extraction.

I. INTRODUCTION

A Boltzmann machine as a type of stochastic recurrent

neural network was invented by Hinton and Sejnowski in

1985 [15]. However it is not efﬁcient to use the generic

Boltzmann machines in machine learning or inference due to

its unconstrained connectivity among variable units. To make

a practical model, Hinton [11] proposes an architecture called

the Restricted Boltzmann Machine (RBM), only units between

visible layer and hidden layer connected.

With the restricted connectivity between visible and hidden

units, an RBM can be regarded as a probabilistic graphical

model with bipartite graph structure. In recent years, RBMs

have attracted considerable research interest in pattern recog-

nition [5], [25] and machine learning [3], [14], [19], [22],

[30], due to their strong ability in feature extraction and

representation.

Units at visible and hidden layers are connected through the

restricted linear mapping with weights to be trained. Given

some training data, the goal of training a RBM model is

to learn the weights between visible and hidden units such

that the probability distribution represented by a RBM ﬁts the

training samples as well as possible. A well trained RBM can

provide efﬁcient representation for new input data following

the same distribution as training data.

The classic RBM model is mainly designed for vectorial

input data or variables. However, data emerging from modern

science and technology are in more general structures. For

example, digital images are collected as 2D matrices, which

reﬂect the spatial correlation or information among pixels.

Guanglei Qi, Yanfeng Sun, Yongli Hu and Jinghua Li are with Beijing Key

Laboratory of Multimedia and Intelligent Technology, College of Metropolitan

Transportation, Beijing University of Technology, Beijing 100124, P. R. China,

e-mail:qgl@emails.bjut.edu.cn, {yfsun, huyongli,lijinghua}@bjut.edu.cn

Junbin Gao is with School of the University of Sydney Busi-

ness School, The University of Sydney, NSW 2006, Australia. e-

mail:junbin.gao@sydney.edu.au

In order to apply the classic RBM to such 2D image data,

a typical workaround is to vectorize 2D data. Unfortunately

such as a vectorization process not only breaks the inherent

high-order image structure, resulting in losing important in-

formation about interaction across modes, but also leads to

increasing the number of model parameters induced by a full

connection between visible and hidden units.

To extend the classic RBM for 2D matrix data, in this paper,

we propose a Matrix-Variate Restricted Boltzmann Machine

(MVRBM) model. Like the classic RBM, the MVRBM model

also deﬁnes a probabilistic model for binary units arranged in a

bipartite graph, but topologically units on the same layer (input

or hidden) are organized in 2D arrays and connected through a

bilinear mapping, see Section III. In fact, the proposed bilinear

mapping speciﬁes a speciﬁc structure in the parameters of the

model, thus gives raise to reduce the number of parameters to

be learned in training process.

In summary, the new model has the following advantages

which make up our contributions in this paper:

1) The total number of parameters to be learned is signif-

icantly less than that in the traditional RBMs, thus the

computational complexity in training and inferring can

be signiﬁcantly improved.

2) Both the visible layer and hidden layer are organized

in the matrix format, thus the spatial information in 2D

matrix data can be maintained in the training and infer-

ence processes and better performance in reconstruction

can be achieved.

3) The idea presented in MVRBM can be easily extended

to any order tensorial data, thus the basic RBM can be

applied to more complex data structures.

The rest of the paper is organized as follows. In Section

II, we summarize the related works to further highlight our

contributions. In Section III, the MVRBM model is introduced

and a stochastic learning algorithm based on Contrast Diver-

gence (CD) is proposed. In Section IV, the performance of the

proposed method is evaluated on computer vision tasks hand-

written digit denoising,reconstruction and recognition . Finally,

conclusions and suggestions for future work are provided in

Section V.

II. R

ELATED WORKS

There have been more and more multiway data acquired

in modern scientiﬁc and engineering research, e.g., medical

images [1], [21], multispectral images [4], [9], and video

clips [10] etc. It is well known that vectorizing multiway

data results in correlation information loss, thus downgrade

the performance of learning algorithm for vectorial data like

the classic RBMs. In recent years, research works on learning

algorithms for multiway data modeling have attracted great

attention.

389

978-1-5090-0620-5/16/$31.00

2016 IEEE

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38591291

粉丝: 6
资源: 956

矩阵变量受限玻尔兹曼机在图像处理中的应用

基于Canonical Variate Analysis的过程监控方案与基准研究

使用Monte Carlo Control Variate方法在Matlab中定价亚洲期权

51单片机最小系统板-IAP15W4K58S4学习手册

Variate.java

Gamma Random Variate Generator:Gamma Random Variate Generator-matlab开发

dbnmatlab代码-Various-Boltzmann:在Matlab中实现的Boltzmann机器的变体

Asian Option - 使用 Monte Carlo Control Variate Method 定价：使用 Monte Carlo Control Variate 定价亚洲期权-matlab开发

RAGE (Random Variate Generator)+Library-开源

A Canonical Variate Analysis based Process Monitoring Scheme and Benchmark Study

非均匀随机变量生成Non-Uniform Random Variate Generation

最新资源