396 IEEE TRANSACTIONS ON BROADCASTING, VOL. 60, NO. 2, JUNE 2014
Fig. 3. Example of BG reference frame by using GMM. (a) Original texture
frame from Outdoor sequence. (b) BG reference frame obtained by using
GMM.
of the proposed approach is outlined in Fig. 2. This approach
is based on the observation that most of occluded regions
belong to the background which are covered by the foreground
objects, and these occluded regions might become visible in
other frames, due to the foreground movement. Thus, in the
proposed approach, we generate a temporal stable background
image in offline mode, then this image is used to fill the dis-
occlusion regions in the DIBR system. As shown in Fig. 2
(a), in the proposed approach, an offline preprocessing step is
used to generate a background image, by using several consec-
utive texture frames. In this stage a stable background image
can be generated with the Gaussian Mixture Model (GMM),
where the regions covered by the moving foreground objects
are replaced by the temporal “stable pixels”. In most cases,
the temporal stable pixels belong to the background, espe-
cially for the covered regions by the foreground objects with
translational motion. An example of that is shown in Fig. 3,
where the background information covered by the moving car,
could be recovered by using GMM method. However, in some
cases, this process may blur the moving regions, especially
for foreground objects with reciprocal motion, such as the
one seen in Fig. 4 (a). In this scene, the dancer is rotating,
and consequently most of foreground information are mis-
takenly modeled as background by using GMM. Therefore,
the movement of foreground objects will be detected in the
Foreground Depth Correlation (FDC) stage to help recover
the background information. With the combination of GMM
and FDC, a background image can be obtained.
This background information can be used during disocclu-
sion filling in DIBR system. Obviously, using GMM and FDC
can only help recovering the background regions occluded by
the moving objects. Therefore, in the proposed disocclusion
filling approach, the disocclusion along the static foreground
objects, will not be updated using the background infromation,
but using the conventional inpainting method [22]. Moreover,
for some small disocclusion or holes caused by discontinuity
of depth value, the inpainting method will be used for these
regions filling. The details of each step will be described in
the following sections.
III. B
ACKGROUND GENERATION
A. Background Generation With GMM
The Gaussian Mixture Model is a commonly used method
to detect the moving objects [23], and in the computer vision
field it has been widely applied to model the stable back-
ground. Different from the methods based on block matching,
the GMM is performed at pixel level, where each pixel is mod-
eled independently by a mixture of K Gaussian distributions
(a common setting is K = 3) [24], [25]. The Gaussian mixture
distribution with K components can be written as:
p(x
t
) =
K
i=1
ω
i,t
· η
x
t
,μ
i,t
,σ
2
i,t
(1)
where p(x
t
) indicates the probability density of pixel x
t
, η is
the Gaussian function with x
t
representing the pixel value at
time t, μ
i,t
and σ
2
i,t
denote the mean and variance of pixel x
t
,
respectively, and ω
i,t
is the ith Gaussian distribution’s weight,
with
K
i
ω
i,t
= 1.
The detailed process of GMM that generates the stable
reference background is described as follows [26]:
1) Firstly, an empty set of models is initialized at the time
instant t
0
.
• The mean value μ
i,t
0
of the first Gaussian model is
set equal to the pixel value of the current frame, and
that of the other models is set to 0.
• The variance value σ
i,t
0
of all the K Gaussian mod-
els are set to a pre-defined large value, e.g., 30 in
this paper.
• The weight value of the first Gaussian model ω
1,t
0
is set to 1, and that of other models is set to 0.
2) For the next frame at the time instant t
1
, the current pixel
is used to match with the K Gaussian models. Then, for
each model i, the condition |x
t
− μ
i,t−1
|≤2.5σ
i,t−1
will
be examined.
• If the condition is satisfied, the matching pro-
cess will be stopped and all the parameters of
the Gaussian models will be updated using the
following role:
– The mean value of the matched Gaussian model,
i.e., the ith model becomes, μ
i,t
= (1−ρ)μ
i,t−1
+
ρx
t
, where ρ = α · η(x
t
,μ
i,t
,σ
2
i,t
),theα is the
learning rate, which is set to 0.005 [26].
– The variance value of the matched Gaussian
model, σ
2
i,t
= (1 − ρ)σ
2
i,t−1
+ ρ(x
t
− μ
i,t
)
2
.
– The weight value of the matched Gaussian model,
ω
i,t
= (1 − α)ω
i,t−1
+ α.
– The mean and the variance of the other Gaussian
models remain unchanged, while the correspond-
ing weight value is updated, ω
i,t
= (1−α)ω
i,t−1
.
• Whereas, if all of the Gaussian models fail to match
the current pixel, then a new Gaussian model is
introduced with μ = x
t
,highσ
2
(e.g., σ = 30)
and a low weight value ω = 0.001 by evicting the
Gaussian model which has the smallest ω/σ value.
– the mean and variance value of the other
Gaussian models remain unchanged;
– the weight value of K Gaussian models are
normalized to
K
i
ω
i,t
= 1.
3) The remaining frames are processed by repeating the
previous step (2).