CONVEX HULL FOR VISUAL TRACKING WITH EMD
Jun Wang, Yuanyun Wang, Chengzhi Deng*, Shengqian Wang, Huasheng Zhu
1 Jiangxi Province Key Laboratory of Water Information Cooperative Sensing and Intelligent
Processing, Nanchang Institute of Technology, Nanchang 330099, China
2 School of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, China
3 Research Laboratory of Cooperative Sensing and Advanced Computing Techniques,
Nanchang Institute of Technology, Nanchang 330099, China
ABSTRACT
Developing an effective target appearance model is a chal-
lenging task due to the influence of factors such as partial
occlusion, illumination variations, fast motion, etc. Ex-
isting appearance models usually utilize the tracking re-
sults from previous frames as target templates upon which
the target appearance model is built by linear combination-
s of the templates. With such kind of representation, vi-
sual tracking is not robust when drastic appearance vari-
ations occur. We propose a simple but effective tracking
algorithm with a novel appearance model in a particle fil-
ter framework. A target candidate is represented by the
convex combination of a set of target templates. Addition-
ally, the distance between a target candidate and the tem-
plates is measured using the EMD. Experimental results on
challenging video sequences against state-of-the-art algo-
rithms demonstrate the robustness and effectiveness of the
proposed tracking algorithm.
Index Terms— Visual tracking, Convex Hull, Particle
filter, Earth Mover’s Distance, Appearance model
1. Introduction
Visual tracking is to continually locate the locations of
a target across a video sequence. It has a wide range of ap-
plications such as human-interaction, visual surveillance
and video retrieval. Despite much progress has been made
in the past decades [1], visual tracking remains a challeng-
ing task due to a number of challenges such as illumination
variations, partial occlusion, background clutters, motion
blur and out-of-plane rotation. In visual tracking, a robust
appearance model is important for the precision and suc-
cess rate, which should be robust to significant appearance
variations. Based on the appearance model, tracking track-
ing can be categorized as either generative[3, 2, 4, 5, 6] or
discriminative [7, 8, 9, 10, 11].
Generally speaking, generative tracking algorithm-
s usually learn a target appearance model to represent
the target and search the most similar image region in
each frame using the learnt appearance model. Unlike
generative tracking algorithms, discriminative tracking al-
gorithms consider visual tracking as a binary classification
*Corresponding author.
problem. A learnt classifier is used to distinguish a tar-
get from the surrounding backgrounds. Here, we briefly
review some typical tracking algorithms that relate to our
work.
IVT algorithm [2] learns an incremental subspace mod-
el to represent a target candidate undergoing the principle
component analysis (PCA). The appearance model [2] is
robust to illumination variations. However, it is s ensitive
to background clutters. In [3], a target candidate is divided
into multiple non-overlapping image patches. Each patch
is described as a histogram. Due to the fixed target tem-
plate is used, the drift problem is alleviated. However, it is
sensitive to illumination variations. Kwon et al. [6] sample
a set of trackers to handle significant appearance and mo-
tion variations. Each tracker is a basic tracker that is robust
to an appearance variation or a motion variation.
Sparse representation mehtods [12] have been applied
to visual tracking [13, 14, 15, 16]. In L1 algorithm [13], a
target candidate is jointly represented by target templates
and trivial templates. The target templates are used to rep-
resent the foreground target, and the trivial templates are
used to described occlusions. Based on sparse representa-
tion t echniques, Zhong et al. [15] propose a robust tracking
algorithm using a sparsity-based collaborative model. Re-
cently, Zhang et al. [16] propose an effective appearance
model based on structure sparse representation.
For generative tracking algorithm, a target candidate is
usually represented by a linear combination of templates
or atoms in a dictionary. The templates or a dictionary is
composed of the representative tracking results from previ-
ous frames across a video sequence. When the drastic ap-
pearance variations occur, this kind of representation is not
robust due to the influence of partial occlusion, illumina-
tion variation, etc. Inspired by the convex hull techniques
that is applied in face recognition [17], we propose a nov-
el visual tracking algorithm (referred to as CHT). A target
candidate is represented by a convex combination upon a
set of target templates in this work. The observation like-
lihood evaluation is an important issue. We evaluate the
likelihood of a target candidate based on Earth M over’s
Distance (EMD) [18] between a target candidate and the
target templates.
The remainder of this paper is organized as follows.
Section 2 present the proposed visual tracking algorithm.
978-1-5090-0654-0/16/$31.00
c
2016 IEEE 433 ICALIP 2016