1712 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 6, JUNE 2011
fading in the background, two additional mechanisms (one at
the pixel level, a second at the blob level) are added to the con-
sensus algorithm to handle entire objects.
The method proposed in this paper operates differently in
handling new or fading objects in the background, without the
need to take account of them explicitly. In addition to being
faster, our method exhibits an interesting asymmetry in that a
ghost (a region of the background discovered once a static object
starts moving) is added to the background model more quickly
than an object that stops moving. Another major contribution of
this paper resides in the proposed update policy. The underlying
idea is to gather samples from the past and to update the sample
values by ignoring when they were added to the models. This
policy ensures a smooth exponential decaying lifespan for the
sample values of the pixel models and allows our technique to
deal with concomitant events evolving at various speeds with a
unique model of a reasonable size for each pixel.
III. D
ESCRIPTION OF A
UNIVERSAL
BACKGROUND
SUBTRACTION TECHNIQUE:V
IBE
Background subtraction techniques have to deal with at least
three considerations in order to be successful in real applica-
tions: 1) what is the model and how does it behave? 2) how is the
model initialized? and 3) how is the model updated over time?
Answers to these questions are given in the three subsections of
this section. Most papers describe the intrinsic model and the
updating mechanism. Only a minority of papers discuss initial-
ization, which is critical when a fast response is expected, as in
the case inside a digital camera. In addition, there is often a lack
of coherence between the model and the update mechanism. For
example, some techniques compare the current value of a pixel
to that of a model with a given tolerance . They consider
that there is a good match if the absolute difference between
and is lower than . To be adaptive over time, is adjusted
with respect to the statistical variance of
. But the statistical
variance is estimated by a temporal average. Therefore, the ad-
justment speed is dependent upon the acquisition framerate and
on the number of background pixels. This is inappropriate in
some cases, as in the case of remote IP cameras whose fram-
erate is determined by the available bandwidth.
We detail in the following a background subtraction tech-
nique, called visual background extractor (ViBe). For conve-
nience, we present a complete version of our algorithm in a
C-like code in Appendix A.
A. Pixel Model and Classification Process
To some extent, there is no way around the determination,
for a given color space, of a probability density function (pdf)
for every background pixel or at least the determination of sta-
tistical parameters, such as the mean or the variance. Note that
with a Gaussian model, there is no distinction to be made as the
knowledge of the mean and variance is sufficient to determine
the pdf. While the classical approaches to background subtrac-
tion and most mainstream techniques rely on pdfs or statistical
parameters, the question of their statistical significance is rarely
discussed, if not simply ignored. In fact, there is no imperative
to compute the pdf as long as the goal of reaching a relevant
background segmentation is achieved. An alternative is to con-
sider that one should enhance statistical significance over time,
and one way to proceed is to build a model with real observed
pixel values. The underlying assumption is that this makes more
sense from a stochastic point of view, as already observed values
should have a higher probability of being observed again than
would values not yet encountered.
Like the authors of [65], we do not opt for a particular form
for the pdf, as deviations from the assumed pdf model are ubiq-
uitous. Furthermore, the evaluation of the pdf is a global process
and the shape of a pdf is sensitive to outliers. In addition, the es-
timation of the pdf raises the nonobvious question regarding the
number of samples to be considered; the problem of selecting a
representative number of samples is intrinsic to all the estima-
tion processes.
If we see the problem of background subtraction as a classi-
fication problem, we want to classify a new pixel value with re-
spect to its immediate neighborhood in the chosen color space,
so as to avoid the effect of any outliers. This motivates us to
model each background pixel with a set of samples instead of
with an explicit pixel model. Consequently no estimation of
the pdf of the background pixel is performed, and so the cur-
rent value of the pixel is compared to its closest samples within
the collection of samples. This is an important difference in
comparison with existing algorithms, in particular with those
of consensus-based techniques. A new value is compared to
background samples and should be close to some of the sample
values instead of the majority of all values. The underlying idea
is that it is more reliable to estimate the statistical distribution
of a background pixel with a small number of close values than
with a large number of samples. This is somewhat similar to ig-
noring the extremities of the pdf, or to considering only the cen-
tral part of the underlying pdf by thresholding it. On the other
hand, if one trusts the values of the model, it is crucial to se-
lect background pixel samples carefully. The classification of
pixels in the background, therefore, needs to be conservative, in
the sense that only background pixels should populate the back-
ground models.
Formally, let us denote by
the value in a given Euclidean
color space taken by the pixel located at
in the image, and by
a background sample value with an index . Each background
pixel
is modeled by a collection of background sample
values
(1)
taken in previous frames. For now, we ignore the notion of time;
this is discussed later.
To classify a pixel value
according to its corresponding
model
, we compare it to the closest values within the
set of samples by defining a sphere
of radius cen-
tered on
. The pixel value is then classified as back-
ground if the cardinality, denoted
, of the set intersection of this
sphere and the collection of model samples
is larger than
or equal to a given threshold
. More formally, we compare
to
(2)