Sparse Affine Hull for Visual Tracking
Jun Wang, Yuanyun Wang, Chengzhi Deng*, Huasheng Zhu, Shengqian Wang and Li Lv
1 Jiangxi Province Key Laboratory of Water Information Cooperative Sensing and Intelligent Processing,
Nanchang Institute of Technology, Nanchang 330099, China
2 School of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, China
Wangjun012778@126.com, Wangyy
abc@163.com, dengchengzhi@126.com
zhuhuasheng@sohu.com, Sqwang113@yahoo.com
Abstract—It is a challenging task to develop a robust ap-
pearance model due to various factors such as partial occlusion,
fast motion, background clutters and illumination variations. In
this paper, we propose a novel target representation for visual
tracking. Namely, a target candidate is represented by sparse
affine combinations of dictionary templates in a particle filter
framework. Affine combinations based target appearances can
cover unknown appearances. In order to adapt the dynamic
scenes across a video sequence, the dictionary templates are
updated in the tracking process. Experimental results on several
challenging video sequences against some state-of-the-art tracking
algorithms demonstrate that the proposed algorithm is robust to
illumination variations, background clutters, etc.
I. INTRODUCTION
Visual tracking is an important issue in computer vision
with a variety of tasks such as vehicle navigation, human-
computer interaction, video surveillance, etc. The goal of
visual tracking is to locate a tracked target across a video
sequence. Although much progress has been made in recent
years [1], it is a challenging tasks to design an effective
appearance model due to the influence of factors such as fast
motion, motion blur, partial occlusion, illumination variation,
in-plane and out-of-plane rotations and background clutters.
Generally speaking, visual tracking can be classified as
either generative [2]-[5], [8]-[10] or discriminative [11]-[17].
Generative tracking algorithms typically learn an appearance
model to represent a target candidate and search for an image
region that has the minimal reconstruct residual as the tracked
target in the current frame. In [2], a target candidate is
divided into multiple non-overlapping image patches, which is
represented by a histogram. The similarity between an image
patch in a target candidate and the corresponding image patch
in the template is measured. The similarity is used as a voting
map to evaluate the likelihood evaluation. The algorithm in [2]
can alleviate the drift problem because the fixed target template
is used, however it is not robust to dynamic scene variations.
Kwon et al.[3] use multiple target appearance models to adapt
the significant appearance variations, and use multiple motion
models to cover motion variations. The algorithm [3] is robust
to complicated appearance variations. He et al.[4] represent
a target by a locality sensitive histogram, which is robust to
drastic illumination variations. Wang et al.[5] propose affine
hull based regularized target representation for visual tracking.
Recently, sparse representation techniques [6] based gen-
erative tracking algorithms are developed [7]-[10]. The L1
tracking algorithm [7] represents each target candidate as a
sparse combination of target templates and trivial templates.
Based on the trivial templates, the L1 is robust to partial
occlusions. In [9], the local patches in a target candidate
are sparsely represented by the corresponding patches in the
dictionary templates. Based on both holistic templates and
local representations, Zhong et al.[8] propose a sparsity-based
collaborative appearance model.
Unlike generative tracking algorithms, discriminative track-
ing algorithms formulate visual tracking as a binary classi-
fication problem, in which a classifier is learnt and used to
distinguish a target from its surrounding background. Avidan
[11] proposes an ensemble tracker by combining a set of weak
classifiers into a strong classifier. In [12], the discriminative
features are updated by an online boosting algorithm. Babenko
et al. [13] propose a discriminative tracking algorithm by
introducing multiple instance learning to updating the classi-
fiers. Bai et al. [14] propose a randomized ensemble tracking
algorithm by combining a set of weak classifiers with a weight
vector that is considered as a distribution of confidence. Hare et
al. [16] introduce a structured output SVM learning technique
and propose a tracking-by-detection algorithm.
For generative tracking algorithms, developing a robust
appearance model is crucial issue. Inspired by the affine
hull representation based face recognition [18] and the sparse
representation based visual tracking, we propose a novel visual
tracking algorithm (referred to as SAHT). The target candidate
is represented by a sparse affine combination on a set of
dictionary templates in our work. The proposed sparse affine
hull based target representation has the advantages of both
the affine hull (i.e., covering the unknown target appearances
that do not appear in the dictionary templates) and sparse
representation (i.e., it is robust to outliers).
The remainder of this paper is organized as follows. Section
II presents the proposed visual tracking algorithm. Section
III evaluates experimental results of the proposed algorithm
against the state-of-the-art algorithms on challenging video
sequences. Section IV concludes the paper.
II. T
HE PROPOSED TRACKING ALGORITHM
In this section, under the particle filter framework, we
propose a novel target appearance model. A target candidate is
represented by sparse affine combinations of a set of dictionary
templates. In order to adapt the dynamic scene variations
and maintain the effectiveness of the dictionary templates, the
templates are dynamically updated.
2016 6th International Conference on Digital Home
978-1-5090-4400-9/16 $31.00 © 2016 IEEE
DOI 10.1109/ICDH.2016.24
85
2016 6th International Conference on Digital Home
978-1-5090-4400-9/16 $31.00 © 2016 IEEE
DOI 10.1109/ICDH.2016.24
85