深度学习对象检测中的FPN特征金字塔网络

需积分: 9 187 浏览量更新于2024-08-27 收藏 696KB PDF 举报

"Feature Pyramid Networks (FPN) 是一种革命性的深度学习方法，针对对象检测任务特别设计。传统的计算机视觉系统依赖特征金字塔来捕捉不同尺度的目标，然而，近期的深度学习物体检测器往往避免使用这种结构，因为它们在计算效率和内存需求上存在挑战。FPN论文提出了一种创新思路，即利用深度卷积神经网络（Deep Convolutional Neural Networks, DCNNs）固有的多尺度金字塔特性，以最小化额外的计算成本构建特征金字塔。 FPN的核心是其独特的'top-down'架构，辅以水平连接（lateral connections），旨在生成高阶语义特征图，覆盖从低分辨率到高分辨率的多个尺度。这种设计使得FPN能够在不引入复杂设计的情况下，作为通用的特征提取器在多种应用场景中展现出显著优势。当将FPN整合进基础的Faster R-CNN（Fast Region-based Convolutional Neural Network）系统中，这种方法能够在COCO（Common Objects in Context）检测基准上取得当时最先进的单模型成绩，超越了包括COCO 2016年在内的所有现有单一模型记录。该成果表明，FPN不仅解决了深度学习检测器在处理多尺度目标时的效率问题，还通过其简洁且高效的架构提升了检测性能。它成为了现代对象检测领域的一个里程碑，推动了后续研究者们进一步探索如何更好地结合多尺度特征和深层网络结构，以提升整体的检测精度和速度。"

Feature Pyramid Networks for Object Detection

Tsung-Yi Lin

1,2

, Piotr Doll

, Ross Girshick

Kaiming He

, Bharath Hariharan

, and Serge Belongie

Facebook AI Research (FAIR)

Cornell University and Cornell Tech

Abstract

Feature pyramids are a basic component in recognition

systems for detecting objects at different scales. But recent

deep learning object detectors have avoided pyramid rep-

resentations, in part because they are compute and memory

intensive. In this paper, we exploit the inherent multi-scale,

pyramidal hierarchy of deep convolutional networks to con-

struct feature pyramids with marginal extra cost. A top-

down architecture with lateral connections is developed for

building high-level semantic feature maps at all scales. This

architecture, called a Feature Pyramid Network (FPN),

shows signiﬁcant improvement as a generic feature extrac-

tor in several applications. Using FPN in a basic Faster

R-CNN system, our method achieves state-of-the-art single-

model results on the COCO detection benchmark without

bells and whistles, surpassing all existing single-model en-

tries including those from the COCO 2016 challenge win-

ners. In addition, our method can run at 6 FPS on a GPU

and thus is a practical and accurate solution to multi-scale

object detection. Code will be made publicly available.

1. Introduction

Recognizing objects at vastly different scales is a fun-

damental challenge in computer vision. Feature pyramids

built upon image pyramids (for short we call these featur-

ized image pyramids) form the basis of a standard solution

[1] (Fig. 1(a)). These pyramids are scale-invariant in the

sense that an object’s scale change is offset by shifting its

level in the pyramid. Intuitively, this property enables a

model to detect objects across a large range of scales by

scanning the model over both positions and pyramid levels.

Featurized image pyramids were heavily used in the

era of hand-engineered features [5, 25]. They were so

critical that object detectors like DPM [7] required dense

scale sampling to achieve good results (e.g., 10 scales per

octave). For recognition tasks, engineered features have

(a) Featurized image pyramid

predict

(b) Single feature map

predict

(d) Feature Pyramid Network

predict

Figure 1. (a) Using an image pyramid to build a feature pyramid.

Features are computed on each of the image scales independently,

which is slow. (b) Recent detection systems have opted to use

only single scale features for faster detection. (c) An alternative is

to reuse the pyramidal feature hierarchy computed by a ConvNet

as if it were a featurized image pyramid. (d) Our proposed Feature

Pyramid Network (FPN) is fast like (b) and (c), but more accurate.

In this ﬁgure, feature maps are indicate by blue outlines and thicker

outlines denote semantically stronger features.

largely been replaced with features computed by deep con-

volutional networks (ConvNets) [19, 20]. Aside from being

capable of representing higher-level semantics, ConvNets

are also more robust to variance in scale and thus facilitate

recognition from features computed on a single input scale

[15, 11, 29] (Fig. 1(b)). But even with this robustness, pyra-

mids are still needed to get the most accurate results. All re-

cent top entries in the ImageNet [33] and COCO [21] detec-

tion challenges use multi-scale testing on featurized image

pyramids (e.g., [16, 35]). The principle advantage of fea-

turizing each level of an image pyramid is that it produces

a multi-scale feature representation in which all levels are

semantically strong, including the high-resolution levels.

Nevertheless, featurizing each level of an image pyra-

mid has obvious limitations. Inference time increases con-

siderably (e.g., by four times [11]), making this approach

impractical for real applications. Moreover, training deep

arXiv:1612.03144v2 [cs.CV] 19 Apr 2017

下载后可阅读完整内容，剩余9页未读，立即下载

TorresFans

粉丝: 41

深度学习对象检测中的FPN特征金字塔网络

Feature Pyramid Networks for Object Detection.pdf

Panoptic Feature Pyramid Networks.pdf

FPN（feature pyramid networks）网络

Feature Pyramid Networks for Object Detection论文阅读

A Brief Overview of the Implementation Principle of FPN (Feature Pyramid Network) in YOLOv8

Feature Pyramid Networks实现的方法

feature pyramid networks for object detection

艾伯特FPN_FeaturePyramidNetworksforObjectDetection[aibbt.com].pdf

目标检测与识别总结20180914.pdf

最新资源