Matrix Variate Restricted Boltzmann Machine
Guanglei Qi, Yanfeng Sun, Junbin Gao, Yongli Hu and Jinghua Li
Abstract—Restricted Boltzmann Machine (RBM) is an impor-
tant generative model modeling vectorial data. While applying an
RBM in practice to images, the data have to be vectorized. This
results in high-dimensional data and valuable spatial information
has got lost in vectorization. In this paper, a Matrix-Variate
Restricted Boltzmann Machine (MVRBM) model is proposed by
generalizing the classic RBM to explicitly model matrix data.
In the new RBM model, both input and hidden variables are
in matrix forms which are connected by bilinear transforms.
The MVRBM has much less model parameters while retaining
comparable performance as the classic RBM. The advantages
of the MVRBM have been demonstrated on three real-world
applications: handwritten digit denoising ,reconstruction and
recognition.
Index Terms—Machine Learning, Restricted Boltzmann Ma-
chine, Digit Recognition, Feature Extraction.
I. INTRODUCTION
A Boltzmann machine as a type of stochastic recurrent
neural network was invented by Hinton and Sejnowski in
1985 [15]. However it is not efficient to use the generic
Boltzmann machines in machine learning or inference due to
its unconstrained connectivity among variable units. To make
a practical model, Hinton [11] proposes an architecture called
the Restricted Boltzmann Machine (RBM), only units between
visible layer and hidden layer connected.
With the restricted connectivity between visible and hidden
units, an RBM can be regarded as a probabilistic graphical
model with bipartite graph structure. In recent years, RBMs
have attracted considerable research interest in pattern recog-
nition [5], [25] and machine learning [3], [14], [19], [22],
[30], due to their strong ability in feature extraction and
representation.
Units at visible and hidden layers are connected through the
restricted linear mapping with weights to be trained. Given
some training data, the goal of training a RBM model is
to learn the weights between visible and hidden units such
that the probability distribution represented by a RBM fits the
training samples as well as possible. A well trained RBM can
provide efficient representation for new input data following
the same distribution as training data.
The classic RBM model is mainly designed for vectorial
input data or variables. However, data emerging from modern
science and technology are in more general structures. For
example, digital images are collected as 2D matrices, which
reflect the spatial correlation or information among pixels.
Guanglei Qi, Yanfeng Sun, Yongli Hu and Jinghua Li are with Beijing Key
Laboratory of Multimedia and Intelligent Technology, College of Metropolitan
Transportation, Beijing University of Technology, Beijing 100124, P. R. China,
e-mail:qgl@emails.bjut.edu.cn, {yfsun, huyongli,lijinghua}@bjut.edu.cn
Junbin Gao is with School of the University of Sydney Busi-
ness School, The University of Sydney, NSW 2006, Australia. e-
mail:junbin.gao@sydney.edu.au
In order to apply the classic RBM to such 2D image data,
a typical workaround is to vectorize 2D data. Unfortunately
such as a vectorization process not only breaks the inherent
high-order image structure, resulting in losing important in-
formation about interaction across modes, but also leads to
increasing the number of model parameters induced by a full
connection between visible and hidden units.
To extend the classic RBM for 2D matrix data, in this paper,
we propose a Matrix-Variate Restricted Boltzmann Machine
(MVRBM) model. Like the classic RBM, the MVRBM model
also defines a probabilistic model for binary units arranged in a
bipartite graph, but topologically units on the same layer (input
or hidden) are organized in 2D arrays and connected through a
bilinear mapping, see Section III. In fact, the proposed bilinear
mapping specifies a specific structure in the parameters of the
model, thus gives raise to reduce the number of parameters to
be learned in training process.
In summary, the new model has the following advantages
which make up our contributions in this paper:
1) The total number of parameters to be learned is signif-
icantly less than that in the traditional RBMs, thus the
computational complexity in training and inferring can
be significantly improved.
2) Both the visible layer and hidden layer are organized
in the matrix format, thus the spatial information in 2D
matrix data can be maintained in the training and infer-
ence processes and better performance in reconstruction
can be achieved.
3) The idea presented in MVRBM can be easily extended
to any order tensorial data, thus the basic RBM can be
applied to more complex data structures.
The rest of the paper is organized as follows. In Section
II, we summarize the related works to further highlight our
contributions. In Section III, the MVRBM model is introduced
and a stochastic learning algorithm based on Contrast Diver-
gence (CD) is proposed. In Section IV, the performance of the
proposed method is evaluated on computer vision tasks hand-
written digit denoising,reconstruction and recognition . Finally,
conclusions and suggestions for future work are provided in
Section V.
II. R
ELATED WORKS
There have been more and more multiway data acquired
in modern scientific and engineering research, e.g., medical
images [1], [21], multispectral images [4], [9], and video
clips [10] etc. It is well known that vectorizing multiway
data results in correlation information loss, thus downgrade
the performance of learning algorithm for vectorial data like
the classic RBMs. In recent years, research works on learning
algorithms for multiway data modeling have attracted great
attention.
389
978-1-5090-0620-5/16/$31.00
c
2016 IEEE