Improving Background Subtraction using Local Binary Similarity Patterns
Pierre-Luc St-Charles, Guillaume-Alexandre Bilodeau
LITIV lab., Dept. of Computer & Software Eng.
´
Ecole Polytechnique de Montr
´
eal
Montr
´
eal, QC, Canada
{pierre-luc.st-charles,gabilodeau}@polymtl.ca
Abstract
Most of the recently published background subtraction
methods can still be classified as pixel-based, as most of
their analysis is still only done using pixel-by-pixel com-
parisons. Few others might be regarded as spatial-based
(or even spatiotemporal-based) methods, as they take into
account the neighborhood of each analyzed pixel. Although
the latter types can be viewed as improvements in many
cases, most of the methods that have been proposed so far
suffer in complexity, processing speed, and/or versatility
when compared to their simpler pixel-based counterparts.
In this paper, we present an adaptive background subtrac-
tion method, derived from the low-cost and highly efficient
ViBe method, which uses a spatiotemporal binary similarity
descriptor instead of simply relying on pixel intensities as
its core component. We then test this method on multiple
video sequences and show that by only replacing the core
component of a pixel-based method it is possible to dramat-
ically improve its overall performance while keeping mem-
ory usage, complexity and speed at acceptable levels for
online applications.
1. Introduction
Background subtraction based on change detection is the
first step used in many video analysis applications to de-
tect and segment foreground objects. Hence, the end results
quality often depends largely on the method used. A naive
approach based on directly comparing pixel intensities over
multiple video frames would not be practical in most cases,
since real video sequences often contain outliers in the form
of noise and dynamic elements (varying illumination, undu-
lating objects, etc.) that should not be considered as fore-
ground. Consequently, more precise and complex reference
models and methods have been developed over the years to
fill the need for robust applications.
What we propose in this paper is an “all-in-one”, single
model, single update scheme, training-less, spatiotemporal-
based background subtraction method which is in part de-
rived from the non-parametric ViBe method[1]. The key as-
pect to spatial and temporal neighborhood analysis here is
that instead of solely using pixel-level intensities as the core
component to create our reference model (in a ViBe-like
paradigm), we complement it with the output of a modified
Local Binary Similarity Pattern (LBSP) descriptor[3]. In
other words, we show that even without modifying a generic
method’s initialization, maintenance and foreground detec-
tion rules, it is possible to achieve proper spatiotemporal
analysis by only incorporating a minimalistic feature-based
component in its reference model. We also present in the
paper a few general improvements that were implemented
in our method, coined LOBSTER (LOcal Binary Similar-
ity segmenTER), over a more “straightforward” integration
strategy.
Altogether, using the proposed approach, standardized
evaluations using the recently-published CDNet dataset[7]
demonstrate that we extract enough spatial and temporal
neighborhood information in video sequences to dramati-
cally increase the base algorithm’s performance across mul-
tiple complex scenarios, without compromising baseline re-
sults. Also, we manage to keep extra memory, processing
time and post-processing operations to a minimum while
outperforming other much more complex state-of-the-art
methods also tested in [7], including ViBe+[5].
2. Related Work
The most simple and earliest examples of change de-
tection algorithms are the ones based on the idea that the
background can be modeled by independently observing ev-
ery pixel’s intensity subsequently (i.e. the so-called pixel-
based algorithms) in order to then determine if a given in-
tensity value is a true outlier (to be considered foreground).
Some of these methods, such as the ones based on den-
sity estimation (via parametric Mixture of Gaussians[23]
or non-parametric Kernel Density Estimation[6]) are now