视觉跟踪新策略：马尔可夫链与深度融合应对挑战

需积分: 9 168 浏览量更新于2024-08-26 收藏 730KB PDF 举报

视觉跟踪作为计算机视觉领域的重要分支，在智能交通、视频监控、视觉导航等领域发挥着关键作用。近年来，尽管出现了众多优秀的跟踪算法，但面对目标由于光照变化、遮挡、非刚性变形等因素引起的外观变化，视觉跟踪仍然是一项具有挑战性的任务。低级特征（如颜色、纹理）虽然在特征跟踪和场景分析中被广泛使用，但由于它们对目标描述的局限性，对于视觉跟踪效果的提升有限。高级特征如语义知识则常用于构建自适应的外观模型，但获取难度大，并且在引入背景噪声时容易导致跟踪漂移。为解决这些问题，中级别特征，尤其是超像素，因其丰富的结构信息而受到越来越多的关注。超像素是通过将图像分割成一组具有相似视觉特性的区域，从而捕捉到目标内部结构和上下文信息，这在一定程度上提高了对目标外观变化的鲁棒性。在一些研究中，如[5]所示，超像素被用于构建更稳定的追踪模型，通过融合低级和中级特征来增强跟踪性能。然而，单纯依赖超像素可能仍不足以应对所有复杂情况，因此，本文提出了通过遍历马尔可夫链（Markov Chain）和深度融合技术来进行视觉跟踪的方法。马尔可夫链是一种概率模型，它假设当前状态只与前一个状态有关，而与过去的状态独立。在视觉跟踪中，这可以用来建模目标状态的动态变化，通过预测目标可能的下一个位置或状态来辅助跟踪。深度融合则是将来自不同层次特征的信息进行融合，以增强跟踪的稳定性和准确性。这种融合策略通常包括多尺度特征的融合，以及利用不同特征之间的互补性，比如低级特征提供细节，中级特征提供结构信息。在具体实现中，首先，通过超像素对图像进行预处理，提取出稳定的特征表示。然后，利用马尔可夫链对这些特征序列进行建模，形成目标的动态轨迹概率分布。在跟踪过程中，根据当前帧的观测值，通过马尔可夫链搜索最有可能的下一状态，作为目标的位置预测。与此同时，通过深度融合机制整合低、中级特征的响应，形成综合的跟踪决策。在融合过程中，可能涉及到不同特征空间的转换和特征选择，如通过卷积神经网络（CNN）提取的高级特征与超像素级别的几何信息相结合。这种方法的优势在于能够充分利用不同特征的空间结构信息和语义信息，提高对目标不变性和环境变化的适应性，降低因单一特征而导致的跟踪失败概率。总结来说，通过遍历马尔可夫链和深度融合，本文的研究旨在设计一种更为稳健和准确的视觉跟踪算法，能够在复杂场景下持续追踪目标，有效抵抗各种外观挑战。这种方法的潜在应用范围广泛，包括自动驾驶、安防监控等领域，对于提高视觉跟踪的性能具有重要意义。

1

1 INTRODUCTION

As an important branch of computer vision, visual tracking

plays a key role in intelligent transportation, video

surveillance, visual navigation, etc. While the recent years

have witnessed the emergence of many excellent tracking

algorithms, it remains a challenging problem due to the

appearance change of target caused by some factors such as

illumination changes, occlusion, non-rigid deformation.

Many existing tracking algorithms utilize low-level and

high-level cues as features. Although low-level cues are

widely used in feature tracking and scene analysis [1, 2],

they are less effective in visual tracking because of their

weak ability of description for targets. On the other hand,

visual tracking based on high-level cues commonly exploits

the semantic knowledge to construct adaptive appearance

models. However, the high-level cues are difficult to obtain

[3], and will lead to drift after introducing background

noises inevitably. To make a trade-off, the mid-level cues

with sufficient structural information of image have been

attracting a lot of attentions, especially the superpixels [4,

5]. In [5], a superpixel–based discriminative appearance

model is established to distinguish the target from

background. While the tracker achieves convincing

tracking performance in the case of occlusion and pose

change, it is still fragile to the background which is cluttered

or similar with target.

Besides, Markov chain has been involved in computer

vision [6-8]. In [6], random walks on an absorbing Markov

chain is utilized to extract salient region from background.

It further exploits the equilibrium distribution in an ergodic

Markov chain to reduce the absorbed time in the long-range

smooth background regions. In [7], matching between two

graphs is formulated as node selection on an association

This work is supported by National Nature Science Foundation under

Grant

61473034, and the Specialized Research Fund for the Doctoral Program

of Higher Education (SRFDP)

under grant 20130006110008.

* Corresponding author

E-mail: lixiaoli@hotmail.com, lixiaolibjut@bjut.edu.cn (X.-L.Li).

graph whose nodes represent candidate correspondences

between the two graphs. The solution is obtained by

simulating random walks with reweighting jumps enforcing

the matching constraints on the association graph. In [8],

the tracker performs the learning and searching in

consecutive order at each time step under a new Bayesian

tracking framework which is formulated under the

autoregressive Hidden Markov Model.

Furthermore, most existing trackers only exploit the cues

extracted from RGB images, whereas few exploit the spatial

depth information of the scene [9, 10]. In [9], a hand model

and fast cost function are redefined to establish a realtime

hand tracking system only use depth map captured by depth

sensor. The tracking performance is fast and robust which

illustrates the effectiveness of depth cue. In [10], a robust

superpixel-based tracker via depth fusion is proposed. And

the graph-regularized sparse coding is introduced into the

appearance model. Although the tracker achieves good

results in background clutters, it shows weak ability in

tackling occlusion.

In this paper, we integrate the graph models and Markov

theory to construct an ergodic Markov chain which regards

superpixels as nodes. Then visual tracking is formulated as

random walks on ergodic Markov chain. The contributions

of the proposed tracker are summarized as follows. First,

we construct an ergodic Markov chain containing positive

and negative template nodes. The random walks on ergodic

Markov chain can search candidate nodes belonging to the

target globally and suppress nodes belonging to the

background. Second, we fuse the depth information into

representation of superpixels to describe patchs more

accurately. Third, we construct another ergodic Markov

chain on depth map to handle occlusion to make our

algorithm more robust.

Visual Tracking via Ergodic Markov Chain and Depth Fusion

Wei Liu

, Xiaoli Li

1. School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, P.R.

China

E-mail: liuwei2012@outlook.com

2. College of Electronic Information and Control Engineering, Beijing University of Technology, Beijing 100124, P.R. China

E-mail: lixiaolibjut@bjut.edu.cn

Abstract: Visual tracking is a significant and challenging task in computer vision. In this paper, we consider visual

tracking as random walks on ergodic Markov chain, where nodes are represented as superpixels and edges represent their

relationships. The graph model and Markov theory are integrated to construct ergodic Markov chain. Based on the

random walks and introduction of positive and negative template nodes, our algorithm can search candidate nodes

belonging to the target globally and suppress nodes belonging to the background. Then we obtain a confidence map that

locates target position. In particular, to describe patchs more accurately, we fuse the depth information into the

representation of superpixels. Furthermore, we construct another ergodic Markov chain on depth map to handle occlusion

to make our algorithm more robust. Experimental results demonstrate that our algorithm achieves excellent performance,

even though in handing occlusion, non-rigid deformation, scale variation, etc.

Key Words: Visual tracking, Ergodic Markov chain, Depth map, Confidence map

3654

978-1-4673-9714-8/16/$31.00

2016 IEEE

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38613640

粉丝: 5
资源: 882

视觉跟踪新策略：马尔可夫链与深度融合应对挑战

马尔可夫链的运用

二阶马尔可夫链1

揭秘视差图转深度图：从原理到实战，解锁3D视觉奥秘

声学模型的声码器集成：结合传统与深度学习优势的3个步骤

polylearn-0.1.dev0-cp35-cp35m-win32.whl.rar

基于Simulink的语音信号降噪与增强.docx

java资源Java条形码生成库 Barcode4J

pgmagick-0.7.5-cp27-cp27m-win32.whl.rar

pendulum-2.1.2-cp310-cp310-win32.whl.rar

com.bishua666.luxxx1.apk

最新资源