Motion Background Modeling based on Context-encoder
Zhenshen Qu, Shimeng Yu, Mengyu Fu
Dept. of Control Science and Engineering,
Harbin Institute of Technology,
Harbin, China
yushimeng@hotmail.com
Abstract—A background modeling method for motion-based
background of a video made by moving camera is proposed in
this paper. We utilize the recently proposed context-encoder to
model the motion-based background from a dynamic
foreground. This method aims to restore the overall scene of a
video by removing the moving foreground objects and learning
the feature of its context. An advantage of this method is that
the performance of background modeling will not be affected
when the camera is moving fast.
Keywords—motion-based background modeling; context-
encoder; convolutional neural networks
I. I
NTRODUCTION
Background modeling is an important component of
many computer vision systems and widely used before tasks
such as foreground detection [1], object segmentation [2],
tracking [3] and video surveillance [4].
Numerous research and studies have been done and a
huge amount of methods have been developed in this area
over recent years. These methods can be classified into
following categories [5]: Basic Background Modeling,
Statistical Background Modeling, Fuzzy Background
Modeling, Background Clustering, Neural Network
Background Modeling, Wavelet Background Modeling and
Background Estimation. More classifications can be found in
[6].
Conventional background modeling methods require a
fixed camera position to keep a stationary background, and a
great deal of work has been done with a stationary camera
about moving objects [7]. However, in some certain
conditions, the camera’s position changes and a modeling of
non-stationary background is needed. The Mixture of
Gaussians (MOG) background model shows its high
efficiency in multi-modal distribution background modeling
and has been widely used. The MOG can adjust to the
condition when some little changes happen to the
background (for example, the waving leaves and gradual
light change). But the MOG background modeling cannot
work well when the scene changes a lot. [8] presented an
approach of background modeling which is able to immune
to the variations of the background, but it does not work
when the movement of camera is fast and the background
changes a lot. [9] introduced a Spatial Distribution of
Gaussians (SDG) model which can detect foreground objects
with non-stationary background. Some similar and earlier
studies can be found in [10]. [11] proposed a real-time
optical flow algorithm to detect moving objects in a dynamic
scene. [12] is a further study of [11]. Another background
modeling method dealing with dynamic scenes, which
computes and utilizes optical flow in a higher dimensional
space towards the modeling of dynamic characteristics has
been proposed in [13]. The motion-based background
modeling method presented in [14] used optical flow to
detect moving objects. But motion field computation based
on optical flow can be time consuming.
In this paper, a new idea is proposed to estimate the
motion-based background while the camera is moving. We
use convolutional neural networks (CNNs) to achieve this
goal. We apply an unsupervised visual feature learning
algorithm presented in [15] to the process of our motion-
based background estimation. [15] introduced the context-
encoder which is used to predict the missing part of an image
according to the surroundings of the missing region in order
to make a prediction that approximate to the original scene as
much as possible. We utilize the context-encoder in the
process of restoring a complete background of a video made
by a moving camera which has dynamic obstacles. An
obvious advantage of this method on background extraction
is that the performance will not be affected even when the
camera is moving fast.
The rest of this paper is organized as follows: Some
related work is introduced in Section 2. Details of the
proposed motion-based background modeling method are
described in Section 3. We also discuss the problems we met
in the process of experiments and propose solutions in that
section. Then, in Section 4, the experimental results are
presented. Finally, the conclusion is given in the Section 5.
II. R
ELATED
W
ORK
CNNs have worked well in many semantic image
understanding tasks including unsupervised understanding
and natural images generating [15]. Autoencoders [16, 18]
which can learn features of an image are typical deep
unsupervised learning method in this field. Denoising
autoencoders [17] can “make the learned representations
robust to partial corruption of the input pattern”. The
context-encoder [15] could be thought of as a variant of
*Research supported by Chinese National Natural Science
Foundation(61375046, Scene flow computation based on dynamic
rimitive feature)
ISBN: 978-1-4673-9187-0 ©2016 IEEE