上下文深度模型驱动的高效视觉追踪算法

52 浏览量更新于2024-08-30 收藏 1MB PDF 举报

本文主要探讨了"通过上下文深度模型学习进行快速跟踪"这一主题，针对视觉追踪这一计算机视觉领域中的挑战性任务，研究人员提出了一种新的快速且鲁棒的追踪算法。该方法直接源自STC[1]，并且在RGB-D数据的利用上有所创新。研究者构建了一个上下文深度模型，用于记录目标及其周围区域的低级别特征之间的空间关联。在深度图像中，他们利用目标的连续性和稳定性，采用区域生长方法来处理跟踪过程中的尺度变化和遮挡检测。这种方法考虑到了目标在三维空间中的动态行为，能够有效应对目标大小变化以及遮挡情况下的跟踪问题。作者通过在具有挑战性的图像序列上进行定性和定量评估，证明了他们的追踪器在性能上优于多个当前最先进的算法。具体而言，论文首先介绍了视觉追踪在计算机视觉中的关键地位，尤其是在视频监控和智能交通等领域的重要性。然后，详细解释了为何传统的视觉追踪方法可能遇到困难，尤其是在复杂环境和长时间跟踪的情况下。接下来，作者详细阐述了上下文深度模型的设计原理，包括如何利用深度信息增强特征的表示能力，以及如何通过深度图像的特性指导跟踪策略。在方法部分，他们详细描述了构建深度模型的步骤，包括特征提取、目标区域的确定、以及如何根据深度信息更新模型以适应目标的运动变化。此外，他们还讨论了如何通过区域生长算法结合模型更新策略来有效地检测和处理遮挡，确保追踪的稳定性和准确性。定量评估部分展示了他们在OTB[2]等标准跟踪基准上的表现，对比了与其他顶尖跟踪器（如KCF[3]、MDNet[4]等）的比较结果，结果显示他们的方法在精度、速度和鲁棒性方面都有显著优势。最后，论文总结了研究的贡献，指出这种方法的潜力以及未来可能的改进方向。这篇研究论文在视觉追踪领域引入了上下文深度模型，通过融合深度信息和传统视觉追踪技术，实现了更快速、稳健的跟踪性能，对于实际应用中的视觉跟踪任务具有重要的理论价值和实践意义。

FAST TRACKING VIA CONTEXT DEPTH

MODEL LEARNING

Zhaoyun Chen, Lei Luo, Mei Wen, Chunyuan Zhang

College of Computer, National University of Defense Technology, ChangSha, China

Email: chenzhaoyun09@163.com

Abstract—Visual tracking is one of the challenging tasks in

computer vision. In this paper, we propose a fast and robust

visual tracking algorithm which is directly extended from STC

[1]. By exploring RGB-D data, we construct a context depth

model to record spatial correlation between the low-level features

from the target and its surrounding regions. According to the

continuity and stability of target in depth image, we adopt region

growing method and a model updating schema for scaling and

occlusion detection. Both qualitative and quantitative evaluations

on challenging benchmark image sequences demonstrate that the

proposed tracker performs favorably against several state-of-the-

art algorithms.

I. INTRODUCTION

Visual tracking is an important research direction in com-

puter vision. A robust and real-time tracker in continuous

image sequences has a wide range of applications such as video

surveillance, intelligent trafﬁc, human-computer interaction,

robot navigation, video compression and retrieval.

In traditional tracking, generative model is usually pro-

posed to represent target appearance changes [2, 3]. Some

approaches have been proposed by mining auxiliary objects

or local visual information surrounding the target to assist

tracking [4, 5]. Numerous learning methods have been adapted

to the tracking problem [6–8]. Algorithms mentioned above,

however, cannot solve heavy occlusion due to lack of 3D visual

understanding. Moreover, some methods cannot work in real-

time scenario due to high computational complexity.

The fast tracking algorithm via spatio-temporal context

learning(STC) [1] presents a new framework to exploit context

information to facilitate visual tracking. Although STC has

a good effect in common scenes, it performs poor under

various challenging factors due to occlusion, scaling variation,

deformation, background clutter and etc.

Meanwhile, off-the-shelf depth sensors, such as Microsoft

Kinect, make depth information acquisition very easy. Depth

information has been introduced into object detection, object

segmentation, scene understanding [9, 10], etc. But there has

not been an algorithm in RGB-D tracking which can be used

effectively for all the situations [11].

We propose to extend STC by exploring RGB-D data. The

depth information is introduced to correct the spatio-temporal

context model into context depth model to improve scale es-

timation, tackle occlusion and deformation. Main contribution

of this paper includes: (1) We construct a 3D context model

Corresponding author: Lei Luo, e-mail: l.luo@nudt.edu.cn

FFT IFFT

Object Location

10000

20000

30000

40000

50000

60000

101

111

121

131

141

151

161

Depth

Frame

The Fl uctuation of Cent er Depth

#81

#84

#90 #93

Object Center Location

Occlusion DetectionRegion Grow ScalingBounding Box

Fig. 1. Overview of the proposed algorithm. It consists of four parts: Object

Center Location, Occlusion Detection, Region Growing Scaling, Bounding

Box Output.

based on depth information; (2) Region Growing method is

adopted for scaling and the target is not limited by ﬁxed aspect

ratio. (3) A method of reducing the learning rate is proposed

to improve the performance in long-term occlusion.

The paper is structured as follows: Section 2 presents our

proposed method. The experiment and evaluation are described

in Section 3. Section 4 concludes the paper.

II. M

ETHODOLOGY

The tracking problem in STC is formulated by computing a

conﬁdence map which estimates the object location likelihood:

𝑐(𝑥)=𝑃 (𝑥∣𝑜), (1)

where 𝑥 ∈ 𝑅

is the object location and 𝑜 stands for the object

present in the scene. In current frame, the object location 𝑥

∗

given. The local context feature set from the image is deﬁned

as 𝑋

𝑐

= {𝑐(𝑧)=(𝐼(𝑧),𝑧)∣𝑧 ∈ Ω

𝑐

(𝑥

∗

)} where 𝐼(𝑧) stands

for the image intensity at location 𝑧 and Ω

𝑐

(𝑥

∗

) stands for the

neighborhood of the location 𝑥

∗

. By marginalizing the joint

probability, the conﬁdence map can be decomposed as

𝑐(𝑥)=𝑃 (𝑥∣𝑜)

∑

𝑐(𝑧)∈𝑋

𝑐

𝑃 (𝑥, 𝑐(𝑧)∣𝑜) (2)

∑

𝑐(𝑧)∈𝑋

𝑐

𝑃 (𝑥∣𝑐(𝑧),𝑜)𝑃 (𝑐(𝑧),𝑜),

where 𝑃 (𝑥∣𝑐(𝑧),𝑜) is the spatio context probability and

𝑃 (𝑐(𝑧)∣𝑜) is the prior context probability. The center of

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38738272

粉丝: 2
资源: 924

上下文深度模型驱动的高效视觉追踪算法

基于深度学习的目标跟踪的方法与实现

基于深度学习的多目标跟踪关联模型设计.docx

在线学习3D上下文，实现强大的视觉跟踪

基于增强群跟踪器和深度学习的目标跟踪.pdf

基于深度学习的视觉跟踪算法研究综述.pdf

基于时空上下文（STC）的运动目标跟踪算法

深度学习驱动的多目标跟踪关联模型新进展

时空上下文辅助的关键点稳定跟踪技术研究

深度学习在机器人运动跟踪中的应用研究

网络入侵检测的深度学习模型开发

最新资源