视频中物体的高精度同一帧内链接检测提升对象识别

对象监测

需积分: 14 119 浏览量更新于2024-09-07 收藏 1.99MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

在现代计算机视觉领域，对象检测是至关重要的技术，特别是在视频处理中。"Object Detection in Videos by High Quality Object Linking" 这篇论文关注的是如何通过提高对象链接的质量来提升视频中的物体检测性能，从而克服由于视频帧质量下降带来的挑战。相比于静态图像的对象检测，视频中的目标检测更具复杂性，因为需要捕捉和理解对象在时间维度上的运动和变化。传统的对象检测方法往往依赖于在相邻帧间比较和匹配边界框（bounding boxes）来建立对象之间的关联，这种方法可能会因帧间差异导致误识别或漏检。然而，论文提出了一种创新的策略，即在同一个帧内进行对象链接，这不仅减少了错误匹配的可能性，而且能够更好地利用帧内的上下文信息。论文的核心贡献包括： 1. 立方体提议网络：设计了一个专用的网络结构，它能够提取出时空候选立方体（spatio-temporal cuboids），这些立方体能够准确地包围物体的运动轨迹，提供更精确的局部运动特征。 2. 短管状片段检测网络：针对视频中较短的片段，提出了一个短管状片段检测器，专注于在短时间内捕捉和识别对象的活动，提高了检测的实时性和精度。 3. 短管路链接：通过短时间跨度内的精细链接，构建出连贯的对象管路（short tubelets），这些管路内的对象分类得分可以进行聚合，从而得到更为稳定和可靠的分类结果。 4. 质量评估与优化：论文还强调了对链接质量的严格评估，通过引入有效的评价指标，优化了对象链接的过程，确保了最终的检测结果具有高度的准确性。 5. 实验验证：作者通过一系列实验展示了新方法在各种基准数据集上的优越性能，对比了与传统方法的显著优势，证明了高质量对象链接对于视频对象检测的重要性。这篇论文对视频对象检测的深入研究，特别是在对象链接的质量上，不仅推动了该领域的技术进步，也为实际应用如视频监控、自动驾驶等提供了强有力的工具支持。通过结合先进的深度学习技术和细致入微的框架设计，作者们成功地提高了视频场景下对象检测的鲁棒性和准确性，为未来的研究方向奠定了坚实的基础。

资源详情

资源推荐

Object Detection in Videos by High Quality Object Linking

Peng Tang

†∗

Chunyu Wang

‡

Xinggang Wang

†

Wenyu Liu

†

Wenjun Zeng

‡

Jingdong Wang

‡

†

School of EIC, Huazhong University of Science and Technology

‡

Microsoft Research Asia

{pengtang,xgwang,liuwy}@hust.edu.cn {chnuwa,wezeng,jingdw}@microsoft.com

Abstract

Compared with object detection in static images, object

detection in videos is more challenging due to degraded im-

age qualities. An effective way to address this problem is to

exploit temporal contexts by linking the same object across

video to form tubelets and aggregating classiﬁcation scores

in the tubelets. In this paper, we focus on obtaining high

quality object linking results for better classiﬁcation. Un-

like previous methods that link objects by checking boxes

between neighboring frames, we propose to link in the same

frame. To achieve this goal, we extend prior methods in fol-

lowing aspects: (1) a cuboid proposal network that extracts

spatio-temporal candidate cuboids which bound the move-

ment of objects; (2) a short tubelet detection network that

detects short tubelets in short video segments; (3) a short

tubelet linking algorithm that links temporally-overlapping

short tubelets to form long tubelets. Experiments on the Im-

ageNet VID dataset show that our method outperforms both

the static image detector and the previous state of the art.

In particular, our method improves results by 8.8% over the

static image detector for fast moving objects.

1. Introduction

Detecting objects in static images [5, 6, 22, 24, 25, 35,

31] has achieved signiﬁcant progress due to the emergence

of deep convolutional neural networks (CNNs) [11, 18, 19,

29]. However, object detection in videos brings additional

challenges due to degraded image qualities, e.g. motion blur

and video defocus, leading to unstable classiﬁcations for the

same object across video. Therefore, many research efforts

have been allocated to video object detection by exploiting

temporal contexts [8, 3, 17, 16, 15, 37, 36], especially af-

ter the introduction of the ImageNet video object detection

(VID) challenge.

Many previous methods exploit temporal contexts by

linking the same object across video to form tubelets and

aggregating classiﬁcation scores in the tubelets [8, 17, 16,

∗

This work was done during Microsoft Research Asia internship.

3]. They ﬁrst use static image detectors to detect objects in

each frame, and then link these detected objects by check-

ing object boxes between neighboring frames, according to

the spatial overlap between object boxes in different frames

[8] or predicting object movements between neighboring

frames [17, 16, 15, 3]. Very promising results are obtained

by these methods.

However, the same object changes its locations and ap-

pearances in neighboring frames due to object motion,

which may make the spatial overlap between boxes of the

same object in neighboring frames not sufﬁcient enough or

the predicted object movements not accurate enough. This

inﬂuences the quality of object linking, especially for fast

moving objects. By contrast, in the same frame, it is ob-

vious that two boxes correspond to the same object if they

have sufﬁcient spatial overlaps. Inspired by these facts, we

propose to link objects in the same frame instead of neigh-

boring frames for high quality object linking.

In our method, a long video is ﬁrst divided into some

temporally-overlapping short video segments. For each

short video segment, we extract a set of cuboid propos-

als, i.e. spatio-temporal candidate cuboids which bound

the movement of objects, by extending the region proposal

network for static images [25] to a cuboid proposal network

for short video segments. The objects across frames lying

in a cuboid are regarded as the same object.

For each cuboid proposal, we adapt the Fast R-CNN [5]

to detect short tubelets. More precisely, we compute the

precise box locations and classiﬁcation scores for each

frame separately, forming a short tubelet representing the

linked object boxes in the short video segment. We com-

pute the classiﬁcation score of the tubelet, by aggregating

the classiﬁcation scores of the boxes across frames. In ad-

dition, to remove spatially redundant short tubelets, we ex-

tend the standard non-maximum suppression (NMS) with a

tubelet overlap measurement, which prevents tubelets from

breaking that may happen in frame-wise NMS. Consider-

ing short range temporal contexts by short tubelets beneﬁts

detection, see Fig. 1 (b).

Finally, we link the short tubelets with sufﬁcient overlap

across temporally-overlapping short video segments. If two

arXiv:1801.09823v2 [cs.CV] 10 Jun 2018

下载后可阅读完整内容，剩余8页未读，立即下载

小新GSUNG0222

粉丝: 13
资源: 8

视频中物体的高精度同一帧内链接检测提升对象识别

MapObject开发

occlusion reasoning for multiple object tracking

linking cxx shared library

android resource linking failed

-exported_symbol[s_list] command line option

how to solve error while loading shared libraries: libmwcpp11compat.iso: cannot open shared object profile when loading MatLab2017b in linux

Android resource linking failed

Relocation+in+generic+ELF(EM：39190)

{52353152-891A-11D0-BEC6-00805F7C4268}

QT error: LNK1236: corrupt or invalid COFF sections

undefined symbol

react-native中linking.getInitialUrl

failed linking references.

failed linking file resources.

undefined reference to size

sap ole doi

android resource linking faild

各种函数声明和定义模块

湖北工业大学在河南2021-2024各专业最低录取分数及位次表.pdf

1805.06605v2 DEFENSE-GAN.pdf

最新资源