MC-HOG Correlation Tracking with Saliency Proposal
Guibo Zhu
†
, Jinqiao Wang
†
,YiWu
‡
, Xiaoyu Zhang
§
, and Hanqing Lu
†
†
National Laboratory of Pattern Recognition,
Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
‡
B-DAT & CICAEET, School of Information & Control,
Nanjing University of Information Science and Technology, Nanjing, 210044, Jiangsu, China
§
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100093, China
{gbzhu, jqwang, luhq}@nlpr.ia.ac.cn
ywu.china@yahoo.com, zhangxiaoyu@iie.ac.cn
Abstract
Designing effective feature and handling the model drift
problem are two important aspects for online visual track-
ing. For feature representation, gradient and color features
are most widely used, but how to effectively combine them
for visual tracking is still an open problem. In this paper, we
propose a rich feature descriptor, MC-HOG, by leveraging
rich gradient information across multiple color channels or
spaces. Then MC-HOG features are embedded into the cor-
relation tracking framework to estimate the state of the target.
For handling the model drift problem caused by occlusion or
distracter, we propose saliency proposals as prior information
to provide candidates and reduce background interference. In
addition to saliency proposals, a ranking strategy is proposed
to determine the importance of these proposals by exploiting
the learnt appearance filter, historical preserved object sam-
ples and the distracting proposals. In this way, the proposed
approach could effectively explore the color-gradient char-
acteristics and alleviate the model drift problem. Extensive
evaluations performed on the benchmark dataset show the su-
periority of the proposed method.
1 Introduction
Visual tracking, which is to estimate object state in an im-
age sequence, is one of the core problems in computer vi-
sion. It has many applications, such as surveillance, action
recognition and autonomous robots/car (Yilmaz, Javed, and
Shah 2006; Wang et al. 2014). One robust visual tracking ap-
proach in real-world scenarios should cope with challenges
as much as possible, such as occlusions, background clutter
and shape deformation.
Feature representation is critical for improving the per-
formance in object detection (Doll
´
ar et al. 2009), tracking
(Henriques et al. 2015), age estimation (Li et al. 2012) and
image ranking (Li et al. 2014). Gradient and color features
are the most widely used ones. To be specific, Histogram
of Oriented Gradient (HOG) (Dalal and Triggs 2005) fea-
tures are good at describing abundant gradient information
while color features like color histograms often capture rich
color characteristics. For example, integral channel features
proposed by (Doll
´
ar et al. 2009) and its expansions (Doll
´
ar
Copyright
c
2016, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
et al. 2014) have achieved good results in object detection
by concatenating gradient and color features directly. Tang
et al. explored the complementary between color histogram
and HOG features in a co-training framework for tracking
(Tang et al. 2007). Although color and gradient features are
widely used in vision based applications, there is no detailed
analysis on the gradient properties for the same target in dif-
ferent color spaces. Therefore, it is interesting to exploit this
kind of gradient properties for effective feature representa-
tion. Inspired by color naming (CN) which transforms RGB
color space into an 11-D probabilistic space (Van De Wei-
jer et al. 2009), the image pixels are projected into multi-
ple color channels to extract gradient features for construct-
ing a new feature descriptor: HOG extracted across Multi-
ple Color channels (MC-HOG). It is a more natural fusion
strategy than direct concatenation which needs to consider
the feature normalization problem across different feature
spaces.
Associated with object tracking, model drift means that
the object appearance model gradually drifts away from the
object due to its accumulated errors caused by online up-
date (Matthews, Ishikawa, and Baker 2004). There are many
strategies to alleviate the drift problem, e.g. semi-supervised
learning (Grabner, Leistner, and Bischof 2008), ensemble-
based learning (Tang et al. 2007; Kwon and Lee 2010), long-
term detector (Kalal, Mikolajczyk, and Matas 2012) and part
context learning (Zhu et al. 2015). In essence, they either ex-
plored the supervised information of the training samples or
the search strategy. However, the reliability of training sam-
ples collected online is difficult to guarantee. To provide rel-
atively less candidate regions and suppress the background
interference, in this paper we introduce saliency proposals
as prior information from visual saliency, which has been
studied by many researchers (Itti, Koch, and Niebur 1998;
Harel, Koch, and Perona 2006) and owns good character-
istics for automatic target initialization and scale estima-
tion (Seo and Milanfar 2010; Mahadevan and Vasconcelos
2013). The saliency map is taken as the prior information
to obtain candidate proposals which are more efficient than
exhaustive search based on sliding windows. In addition to
saliency proposals, a ranking strategy is proposed to deter-
mine the importance of these proposals and estimate the
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16)