1063-6706 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TFUZZ.2015.2406889, IEEE Transactions on Fuzzy Systems
IEEE TRANSACTIONS ON FUZZY SYSTEMS 3
III. FUZZY RESTRICTED BOLTZMANN MACHINE AND ITS
LEARNING ALGORITHM
A. Fuzzy Restricted Boltzmann Machine
The proposed novel fuzzy restricted Boltzmann machine
(FRBM) is illustrated in Fig. 3, in which the connection
weighs and biases are fuzzy parameters denoted by
θ. There
are several merits of the FRBM model. The first one is that
the FRBM has much better representation than the regular
RBM in modelling probabilities over visible and hidden units.
Specifically, the RBM is only a special case of the FRBM
when no fuzziness exists in the FRBM model. The second
one is that the robustness of the FRBM model surpasses RBM
model. The FRBM shows out more robustness when it comes
to the fitting of the model with noisy data. All these advantages
spring from the fuzzy extension of the relationships between
cross-layer variables, and inherit the characteristics of fuzzy
models.
Fig. 3: Fuzzy Restricted Boltzmann Machine
Since the FRBM is an extension of the RBM model, so the
discussion starts from a brief introduction on the RBM model.
An RBM is an energy-based probabilistic model, in which the
probability distribution is defined through an energy function.
Its probability is defined as
P (x, h, θ) =
e
−E(x,h,θ)
Z
, (7)
Z =
X
˜x
X
˜
h
e
−E(˜x,
˜
h,θ)
, (8)
where E(x, h, θ) is the energy function, θ are the parameters
governing the model, Z is the normalizing factor which is
called the partition function,
˜
x and
˜
h are two vector variables
representing visible and hidden units that are used to traverse
and summarize all the configurations of units on the graph.
The energy function for the RBM is defined by
E(x, h, θ) = −b
T
x − c
T
h − h
T
Wx, (9)
where b
j
and c
i
are the offsets, and W
ij
is the connection
weight between j-th visible unit and i-th hidden unit, and θ =
{b, c, W}.
To establish fuzzy restricted Boltzmann machine, it is nec-
essary to firstly define the fuzzy energy function for the model.
The fuzzy energy function can be extended from Eqn. (9) in
accordance with extension principle as follow
E(x, h, θ) = −b
T
x − c
T
h − h
T
Wx, (10)
where
E(x, h, θ) is a fuzzified energy function, and θ =
{b, c, W} are fuzzy parameters. Correspondingly, the fuzzy
free energy
F, which marginalize hidden units and map Eqn.
(7) into a simpler one, is deduced as
F(x, θ) = − log
X
˜
h
e
−
E(x,
˜
h,θ)
, (11)
where
F is extended from crisp free energy function F:
F(x, θ) = − log
X
˜
h
e
−E(x,
˜
h,θ)
. (12)
If the fuzzy free energy function is directly employed to
define the probability, it leads to a fuzzy probability [27].
Finally, the optimization in learning process turns into a fuzzy
maximum likelihood problem. However, this kind of problem
is quite intractable because the fuzzy objective function is non-
linear and the membership function is difficult to compute,
since the computation of its alpha-cuts become NP-hard prob-
lems [29]. Therefore, it is necessary to transform the problem
into regular maximum likelihood problem by defuzzifying the
fuzzy free energy function (11). The centre of area (centroid)
method [30] is employed to defuzzify the fuzzy free energy
function
F(x). Then the likelihood function can be defined by
the defuzzified fuzzy free energy function. Consequently, the
fuzzy optimization problem becomes real-valued problem, and
conventional optimization approaches can be directly applied
to find the optimal solutions. The centroid of fuzzy number
F(x) is denoted by F
c
(x), and has the following form
F
c
(x,
θ) =
R
θF(x, θ)dθ
R
F(x, θ)dθ
, θ ∈
θ. (13)
Naturally, after the fuzzy free energy is defuzzified, the prob-
ability can be defined as
P
c
(x,
θ) =
e
−F
c
(x;
θ)
Z
, Z =
X
˜x
e
−F
c
(˜x,
θ)
. (14)
In the fuzzy RBM model, the objective function is the negative
log-likelihood, which is given by
L(
θ, D) = −
X
x∈D
log P
c
(x,
θ), (15)
where D is the training dataset.
The learning problem is to find optimal solutions for param-
eters
θ that minimize the objective function L(θ, D), i.e.,
min
θ
L(
θ, D) (16)
In the following subsection, the detailed procedure to address
the dual problem of maximum likelihood by utilizing stochas-
tic gradient descent method will be investigated.