1
1 INTRODUCTION
As an important branch of computer vision, visual tracking
plays a key role in intelligent transportation, video
surveillance, visual navigation, etc. While the recent years
have witnessed the emergence of many excellent tracking
algorithms, it remains a challenging problem due to the
appearance change of target caused by some factors such as
illumination changes, occlusion, non-rigid deformation.
Many existing tracking algorithms utilize low-level and
high-level cues as features. Although low-level cues are
widely used in feature tracking and scene analysis [1, 2],
they are less effective in visual tracking because of their
weak ability of description for targets. On the other hand,
visual tracking based on high-level cues commonly exploits
the semantic knowledge to construct adaptive appearance
models. However, the high-level cues are difficult to obtain
[3], and will lead to drift after introducing background
noises inevitably. To make a trade-off, the mid-level cues
with sufficient structural information of image have been
attracting a lot of attentions, especially the superpixels [4,
5]. In [5], a superpixel–based discriminative appearance
model is established to distinguish the target from
background. While the tracker achieves convincing
tracking performance in the case of occlusion and pose
change, it is still fragile to the background which is cluttered
or similar with target.
Besides, Markov chain has been involved in computer
vision [6-8]. In [6], random walks on an absorbing Markov
chain is utilized to extract salient region from background.
It further exploits the equilibrium distribution in an ergodic
Markov chain to reduce the absorbed time in the long-range
smooth background regions. In [7], matching between two
graphs is formulated as node selection on an association
This work is supported by National Nature Science Foundation under
Grant
61473034, and the Specialized Research Fund for the Doctoral Program
of Higher Education (SRFDP)
under grant 20130006110008.
* Corresponding author
E-mail: lixiaoli@hotmail.com, lixiaolibjut@bjut.edu.cn (X.-L.Li).
graph whose nodes represent candidate correspondences
between the two graphs. The solution is obtained by
simulating random walks with reweighting jumps enforcing
the matching constraints on the association graph. In [8],
the tracker performs the learning and searching in
consecutive order at each time step under a new Bayesian
tracking framework which is formulated under the
autoregressive Hidden Markov Model.
Furthermore, most existing trackers only exploit the cues
extracted from RGB images, whereas few exploit the spatial
depth information of the scene [9, 10]. In [9], a hand model
and fast cost function are redefined to establish a realtime
hand tracking system only use depth map captured by depth
sensor. The tracking performance is fast and robust which
illustrates the effectiveness of depth cue. In [10], a robust
superpixel-based tracker via depth fusion is proposed. And
the graph-regularized sparse coding is introduced into the
appearance model. Although the tracker achieves good
results in background clutters, it shows weak ability in
tackling occlusion.
In this paper, we integrate the graph models and Markov
theory to construct an ergodic Markov chain which regards
superpixels as nodes. Then visual tracking is formulated as
random walks on ergodic Markov chain. The contributions
of the proposed tracker are summarized as follows. First,
we construct an ergodic Markov chain containing positive
and negative template nodes. The random walks on ergodic
Markov chain can search candidate nodes belonging to the
target globally and suppress nodes belonging to the
background. Second, we fuse the depth information into
representation of superpixels to describe patchs more
accurately. Third, we construct another ergodic Markov
chain on depth map to handle occlusion to make our
algorithm more robust.
Visual Tracking via Ergodic Markov Chain and Depth Fusion
Wei Liu
1
, Xiaoli Li
2*
1. School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, P.R.
E-mail: liuwei2012@outlook.com
2. College of Electronic Information and Control Engineering, Beijing University of Technology, Beijing 100124, P.R. China
E-mail: lixiaolibjut@bjut.edu.cn
Abstract: Visual tracking is a significant and challenging task in computer vision. In this paper, we consider visual
tracking as random walks on ergodic Markov chain, where nodes are represented as superpixels and edges represent their
relationships. The graph model and Markov theory are integrated to construct ergodic Markov chain. Based on the
random walks and introduction of positive and negative template nodes, our algorithm can search candidate nodes
belonging to the target globally and suppress nodes belonging to the background. Then we obtain a confidence map that
locates target position. In particular, to describe patchs more accurately, we fuse the depth information into the
representation of superpixels. Furthermore, we construct another ergodic Markov chain on depth map to handle occlusion
to make our algorithm more robust. Experimental results demonstrate that our algorithm achieves excellent performance,
even though in handing occlusion, non-rigid deformation, scale variation, etc.
Key Words: Visual tracking, Ergodic Markov chain, Depth map, Confidence map
3654
978-1-4673-9714-8/16/$31.00
c
2016 IEEE