利用特征金字塔网络的移动物体检测后处理方法

28 浏览量更新于2024-08-27 收藏 591KB PDF 举报

"本文提出了一种基于特征金字塔网络的运动物体检测后处理方法——残差背景网络(Residual Background Networks, ResBGNets)，旨在提高视频序列中运动物体检测的准确性。该方法通过学习现有方法结果与地面真实情况之间的残差图像，来理解和修正分类错误，结合了低分辨率层的空间信息和高分辨率层的语义特征，以提升检测性能。" 在计算机视觉领域，卷积神经网络（CNNs）已经展现出了强大的图像分类能力。运动物体检测被视为一种分类过程，需要将每个像素标记为前景像素或背景像素。尽管现代CNN模型在图像识别和目标检测上取得了显著进步，但在运动物体检测中仍然存在误分类的问题。为了改善这一状况，本文提出的ResBGNets是一种创新的后处理策略。 ResBGNets的核心是学习现有检测方法与实际地面真相之间的差异，即残差图像。这种方法的优势在于，它有助于深入理解每个算法的内在特性，并对错误分类进行校正，而不是直接试图学习地面真相。在ResBGNets中，采用了特征金字塔网络（Feature Pyramid Networks, FPN）来融合不同层次的信息。FPN是一种结构，能够将低分辨率层级的丰富空间信息与高分辨率层级的更高级别语义特征相结合。 FPN的工作原理是，通过上采样高层特征图以匹配低层的分辨率，同时保持其语义信息。这样，上下文信息可以在不同尺度上得到传播，对于检测小而细节丰富的运动物体尤其有用。在ResBGNets中，FPN的这种特性被利用来优化运动物体检测的边界框定位和分类，从而提高整体检测的精度和鲁棒性。此外，通过训练模型学习残差，ResBGNets可以捕捉到原始方法未捕获的微妙细节和运动模式。这不仅增强了模型对复杂场景的理解，还减少了由于背景混淆或光照变化导致的误检。因此，ResBGNets在处理视频序列时，能更好地跟踪和识别连续帧中的运动物体，这对于自动驾驶、监控系统和无人机应用等实时场景具有重要意义。这篇研究论文介绍的ResBGNets方法提供了一种新的思路，通过改进现有的运动物体检测技术，利用特征金字塔网络学习残差信息，提高了检测的准确性和稳定性。这一贡献对于推动计算机视觉领域的运动物体检测技术发展具有重要的理论和实践价值。

A Post-Processing Approach

in Moving Objects Detection via Feature Pyramid Networks

Li Lin, Bin Wang, Yinjuan Gu

School of Communication and Information Engineering

Shanghai University, Shanghai 200072, China

jocelyn_ly@shu.edu.cn

Abstract— Recent work has shown that Convolutional Neural

Networks (CNNs) have great ability to deal with classification

problems in pattern recognition field. Moving objects

detection, regarding as a classification process, labels every

pixel as a foreground pixel or a background pixel. In this

paper, we proposed an effective post-processing approach,

Residual Background Networks (ResBGNets), to improve the

accuracy of moving objects detection in video sequences.

Instead of learning the ground truth directly, our model

learns the residual pictures between the results of existing

methods and the ground truth. It benefits to understand the

hidden character of each algorithm and correct the

misclassification. Inside ResBGNets, we build Feature

Pyramid Networks (FPN) to combine spatial information of

the low-resolution level with semantical features of high-level

of the high-resolution level. Evaluation performed on the

2014 CDnet dataset reveals that through our approach, most

of the existing background subtraction methods can get

better detection results and a significant higher FM score.

Keywords-Moving objects detection; convolutional neural

networks; background subtraction; residual pictures; feature

pyramids

I. INTRODUCTION

In the past few years, video surveillance is not only

applied in traditional areas that need security such as banks,

airports or traffic, but also widely used in other aspects of

our daily life. Analysing millions of those captured video

sequences manually requires a considerable amount of

time. Fortunately, computer technology today is capable to

realize it effectively. In vehicle tracking [1], people

counting [2], action recognition [3], and many other

computer vision applications [4-5], moving objects

detection is always exploited as the primary work. Thus, it

gains strong concerns and interests from researchers. The

main purpose of motion detection is to separate foreground

and background pixels. In consideration of complexity and

uncertainty in real-world environment, the traditional

method which regards a static image as background

reference has been replaced by various state-of-the-art

background subtraction methods and supervised machine

learning algorithms.

Background subtraction methods can complete the

detection without any manual intervention. Single

Gaussian model [6] uses just one Gaussian function to

estimate the distribution of a background pixel. Such

model is only suitable for constant scenes. Gaussian

Mixture Model (GMM) [7-8] is an extension of single

Gaussian model. It describes a background pixel by a

mixture of K or adaptive Gaussian distributions so that it

can deal with a dynamic complex background (e.g. rain,

swaying tree leaves, and tipples). Differing from GMM,

Non-parametric model based on Kernel Density

Estimation (KDE) [9] determines its background

probability density functions according to the very recent

observations completely. These classical probabilistic

approaches always do not perform well in case of

encountering camouflage, cast shadows, sudden

illumination changes, camera motion and so on.

Some non-mathematical background subtraction

methods [10-13] also achieve an accurate result without

the need for manual intervention. Visual Background

extractor (ViBe) [10] is modelling for every pixel with just

twenty colour values so that it can save much space and

relieve memory pressure. Rapid and simple ‘one-frame-

initialization’ is another remarkable advantage of it. Self-

Balanced SENsitivity SEgmenter (SuBSENSE) [11] makes

some improvements based on ViBe. It suggests that

individual pixels are characterized by not only colour

values but also local texture features. The decision

threshold and the update rate, which are fixed in ViBe

algorithm, are adaptive to monitor the background

dynamics segmentation noise. Bin Wang et al. [12]

proposed a fast and effective Adapting Multi-resolution

Background ExtractoR (AMBER), which applies

efficacies to indicate the matching frequency for each

background value. The innovations of Multimode

Background Subtraction (MBS) [13] are the use of

multiple colour spaces, background model bank for

background modelling process, Mega-Pixels formation and

so on.

Although most of the background subtraction methods

have reached a certain degree of accuracy, their F-

measures are still too low compared to supervised machine

learning algorithms. Yi Wang et al. [14] proposed a multi-

resolution convolutional neural network with a cascaded

architecture named Cascade CNN. It uses a limited number

of ground truth images, where every foreground moving

object is manually annotated, as the training set. Lim et al.

[15] proposed an encoder-decoder type network model,

which contains a triplet CNN operating in three different

scales for feature encoding and a transposed convolutional

network for decoding. The method in [16] randomly

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38586186

粉丝: 9
资源: 943

利用特征金字塔网络的移动物体检测后处理方法

Feature Pyramid Networks for Object Detection.pdf

Panoptic Feature Pyramid Networks.pdf

Feature Pyramid Networks for Object Detection论文阅读

FPN.pdf Feature Pyramid Networks

FPN（feature pyramid networks）网络

Sub-Image-Anomaly-Detection-with-Deep-Pyramid-Correspondences-in-PaddlePaddle:PaddlePaddle的“具有深金字塔对应的子图像异常检测”的实现

A-2 Extended Feature Pyramid Network for Small Object Detection.pdf

cvpr2019_Pyramid-Feature-Attention-Network-for-Saliency-detection:显着性检测的金字塔特征选择网络的代码和模型

yolov8系列--Detect known and unknown objects in the open wor.zip

restful-services-in-pyramid：Pyramid和Python课程讲义材料中的RESTful HTTP服务

最新资源