RefineDet：单次检测细化神经网络实现高精度高效目标检测

深度学习

需积分: 0 175 浏览量更新于2023-05-20 收藏 3.45MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

"RefineDet论文提出了一个名为RefineDet的单阶段目标检测器，它融合了两阶段方法（如Faster R-CNN）的高精度和一阶段方法（如SSD）的高效率优势。RefineDet由两个相互连接的模块组成：锚点精炼模块和对象检测模块。锚点精炼模块主要负责过滤负样本锚点以减小分类器搜索空间，以及初步调整锚点的位置和大小以优化后续回归器的初始化。对象检测模块则接收经过精炼的锚点，进一步提升回归效果并预测多类标签。此外，设计了转移连接块将锚点精炼模块的特征传递到对象检测模块，用于预测物体的位置、大小和类别。多任务损失函数使得整个网络可以端到端训练。实验在PASCAL VOC 2007、PASCAL VOC 2012和MS COCO数据集上验证了RefineDet具有领先的检测精度和高效性能。" 本文的核心是RefineDet，这是一个创新的单阶段目标检测框架，旨在解决两阶段和一阶段方法之间的权衡。两阶段方法如Faster R-CNN在准确性上表现出色，但计算成本较高；而一阶段方法如SSD则追求速度牺牲了部分准确性。RefineDet通过其独特的设计，试图在两者之间找到平衡。 RefineDet的关键组件是锚点精炼模块和对象检测模块。锚点精炼模块首先执行两个关键任务：剔除负样本锚点，减少分类任务的工作量，同时对剩余的锚点进行粗略的位置和大小调整，为后续的边界框回归提供更好的起点。这样的设计有助于降低背景误检率，提高检测的准确性。接下来，对象检测模块接收经过精炼的锚点，进一步细化边界框回归和分类预测。这里，设计了一个转移连接块，它允许从锚点精炼模块中提取的特征被用于预测物体的位置、尺寸和类别，增强了模型的综合检测能力。训练过程采用多任务损失函数，该函数结合了分类和回归的损失，使得网络能够同时优化这两个任务，从而实现端到端的训练。这种集成训练策略有助于提升模型的整体性能。实验结果表明，RefineDet在PASCAL VOC和MS COCO等标准数据集上，不仅在检测精度上超越了传统的两阶段方法，而且保持了接近一阶段方法的效率。这证明了RefineDet在目标检测领域的优越性，为实际应用提供了更高效且精确的解决方案。

资源详情

资源推荐

achieving satisfactory accuracy with high efﬁciency. DPM

[12] is another popular method using mixtures of multi-

scale deformable part models to represent highly variable

object classes, maintaining top results on PASCAL VOC [8]

for many years. However, with the arrival of deep convolu-

tional network, the object detection task is quickly dom-

inated by the CNN-based detectors, which can be roughly

divided into two categories, i.e., the two-stage approach and

one-stage approach.

Two-Stage Approach. The two-stage approach consists of

two parts, where the ﬁrst one (e.g., Selective Search [46],

EdgeBoxes [55], DeepMask [32, 33], RPN [36]) generates a

sparse set of candidate object proposals, and the second one

determines the accurate object regions and the correspond-

ing class labels using convolutional networks. Notably, the

two-stage approach (e.g., R-CNN [16], SPPnet [18], Fast R-

CNN [15] to Faster R-CNN [36]) achieves dominated per-

formance on several challenging datasets (e.g., PASCAL

VOC 2012 [11] and MS COCO [29]). After that, numer-

ous effective techniques are proposed to further improve the

performance, such as architecture diagram [5, 26, 54], train-

ing strategy [41, 48], contextual reasoning [1, 14, 40, 50]

and multiple layers exploiting [3, 25, 27, 42].

One-Stage Approach. Considering the high efﬁciency, the

one-stage approach attracts much more attention recently.

Sermanet et al. [38] present the OverFeat method for clas-

siﬁcation, localization and detection based on deep Con-

vNets, which is trained end-to-end, from raw pixels to ul-

timate categories. Redmon et al. [34] use a single feed-

forward convolutional network to directly predict object

classes and locations, called YOLO, which is extremely

fast. After that, YOLOv2 [35] is proposed to improve

YOLO in several aspects, i.e., add batch normalization on

all convolution layers, use high resolution classiﬁer, use

convolution layers with anchor boxes to predict bounding

boxes instead of the fully connected layers, etc. Liu et al.

[30] propose the SSD method, which spreads out anchors

of different scales to multiple layers within a ConvNet and

enforces each layer to focus on predicting objects of a cer-

tain scale. DSSD [13] introduces additional context into

SSD via deconvolution to improve the accuracy. DSOD

[39] designs an efﬁcient framework and a set of principles to

learn object detectors from scratch, following the network

structure of SSD. To improve the accuracy, some one-stage

methods [24, 28, 53] aim to address the extreme class im-

balance problem by re-designing the loss function or clas-

siﬁcation strategies. Although the one-stage detectors have

made good progress, their accuracy still trails that of two-

stage methods.

3. Network Architecture

Refer to the overall network architecture shown in Fig-

ure 1. Similar to SSD [30], ReﬁneDet is based on a feed-

forward convolutional network that produces a ﬁxed num-

ber of bounding boxes and the scores indicating the pres-

ence of different classes of objects in those boxes, followed

by the non-maximum suppression to produce the ﬁnal re-

sult. ReﬁneDet is formed by two inter-connected modules,

i.e., the ARM and the ODM. The ARM aims to remove neg-

ative anchors so as to reduce search space for the classiﬁer

and also coarsely adjust the locations and sizes of anchors

to provide better initialization for the subsequent regressor,

whereas ODM aims to regress accurate object locations and

predict multi-class labels based on the reﬁned anchors. The

ARM is constructed by removing the classiﬁcation layers

and adding some auxiliary structures of two base networks

(i.e., VGG-16 [43] and ResNet-101 [19] pretrained on Im-

ageNet [37]) to meet our needs. The ODM is composed of

the outputs of TCBs followed by the prediction layers (i.e.,

the convolution layers with 3 × 3 kernel size), which gener-

ates the scores for object classes and shape offsets relative to

the reﬁned anchor box coordinates. The following explain

three core components in ReﬁneDet, i.e., (1) transfer con-

nection block (TCB), converting the features from the ARM

to the ODM for detection; (2) two-step cascaded regression,

accurately regressing the locations and sizes of objects; (3)

negative anchor ﬁltering, early rejecting well-classiﬁed neg-

ative anchors and mitigate the imbalance issue.

Transfer Connection Block. To link between the ARM

and ODM, we introduce the TCBs to convert features of dif-

ferent layers from the ARM, into the form required by the

ODM, so that the ODM can share features from the ARM.

Notably, from the ARM, we only use the TCBs on the fea-

ture maps associated with anchors. Another function of the

TCBs is to integrate large-scale context [13, 27] by adding

the high-level features to the transferred features to improve

detection accuracy. To match the dimensions between them,

we use the deconvolution operation to enlarge the high-level

feature maps and sum them in the element-wise way. Then,

we add a convolution layer after the summation to ensure

the discriminability of features for detection. The architec-

ture of the TCB is shown in Figure 2.

Two-Step Cascaded Regression. Current one-stage meth-

ods [13, 24, 30] rely on one-step regression based on various

feature layers with different scales to predict the locations

and sizes of objects, which is rather inaccurate in some chal-

lenging scenarios, especially for the small objects. To that

end, we present a two-step cascaded regression strategy to

regress the locations and sizes of objects. That is, we use

the ARM to ﬁrst adjust the locations and sizes of anchors to

provide better initialization for the regression in the ODM.

Speciﬁcally, we associate n anchor boxes with each regu-

larly divided cell on the feature map. The initial position of

each anchor box relative to its corresponding cell is ﬁxed.

At each feature map cell, we predict four offsets of the re-

ﬁned anchor boxes relative to the original tiled anchors and

剩余13页未读，继续阅读

intjun

粉丝: 25
资源: 6

会员权益专享

RefineDet：单次检测细化神经网络实现高精度高效目标检测

Fast RCNN和Faster RCNN

基于caffe搭建RefineDet并训练自己的模型

RefineDet-plus-plus

refinedet.pytorch:RefineDet的PyTorch实现

train_refinedet.py

refinedet网络目标检测CT图像代码

refinedet网络框架代码

refinedet网络结构

refinedet网络将卷积层中的cv6-1以及cv6-2等相关联层删除代码

refinedet网络将卷积层中的cv6-1以及cv6-2等相关联层删除，然后对conv4-3以及coonv5-3卷积层进行bn归一化处理代码

去除refinedet网络的conv6-1，conv6-2卷积层，对conv4-3以及conv5-3进行BN归一化处理代码

推荐20个以上比较好的目标检测模型

pytorch SSD

人工智能当前有几种目标检测技术

目标检测算法的国内外现状

SSD算法的基本原理，并综述SSD的各种改进算法(基本思路)

300ssm_jsp_mysql 记账管理系统.zip（可运行源码+sql文件+文档）

一个简单的计数器，带有 2 个多路复用 SSD 和 2 个推送 btns 以递增或复位，使用分层架构在基于 stm32 ARM

yolov8算法火焰和烟雾识别训练权重+数据集

docker python3:10版本 镜像

会员权益专享

最新资源

docker python3:10版本镜像