新型无RoI-Pooling两阶段目标检测网络

81 浏览量更新于2024-08-26 收藏 3.39MB PDF 举报

本文主要探讨了一种新颖的两阶段目标检测网络，该网络摒弃了传统的Region of Interest (RoI) Pooling层，旨在提高检测效率并减少计算重复性。在现有的两阶段目标检测框架中，如Faster R-CNN、R-FCN等，通常在第一阶段生成一组候选框，然后在第二阶段对这些候选框进行细化和分类。RoI Pooling是关键步骤，它将图像区域的特征映射到固定大小的池化区域，以便于后续神经网络处理。然而，由于第一阶段候选框可能存在重叠，这导致第二阶段对每个候选框的特征提取变得冗余，从而降低了整体的检测速度。此外，RoI Pooling在处理形状较长的物体时，可能会使特征变形，影响检测精度。为了克服这些问题，研究者提出了名为Spatial Alignment Network (SAN)的新型两阶段检测网络。 SAN的主要创新点在于它跳过了RoI Pooling层，转而采用一种空间对齐策略，这减少了在第二阶段对每个候选框进行独立处理的需求。这种设计减少了计算的重复性，提高了算法的执行效率。同时，文章提到了一种稀疏卷积（Atrous Convolution）技术的应用，它能够在不牺牲分辨率的前提下增加感受野，有助于捕捉更丰富的上下文信息，进一步提升目标检测的准确性。通过这种方式，SAN网络能够在保持较高检测性能的同时，优化了目标检测流程，使得在单帧图片上的检测更为快速且适用于各种形状的目标。这种无RoI Pooling的设计为后续的两阶段目标检测网络提供了新的思路，对于提升实时性和准确性具有潜在的价值。未来的研究可能着重于如何更好地整合空间对齐和其他高效特征提取方法，以实现更高层次的性能优化。

A New Two-Stage Object Detection Network

without RoI-Pooling

Chao Yan

, Weihai Chen

∗

, Peter C. Y. Chen

, Kendrick Amezquita S.

, Xingming Wu

1. School of Automation Science and Electrical Engineering, Beihang University, 100191, Beijing, China

E-mail: whchenbuaa@126.com

2. Department of Mechanical Engineering, National University of Singapore, 117576, Singapore

E-mail: mpechenp@nus.edu.sg

Abstract: Two-stage object detection networks often propose a set of candidate boxes in the ﬁrst stage, and then ﬁne-

tune the boxes in the second stage. The original two-stage object detection methods mostly process the features among

the candidate boxes in the picture by RoI-Pooling [3]. Due to the overlaps of the candidate boxes proposed in the ﬁrst

stage, the calculation of the second stage is repetitive and the single-frame detection is slow. RoI-Pooling also makes

the features of the elongated shape deformed. In this paper, we present a new two-step object detection network, called

Spatial Alignment Network(SAN), which does not use the RoI-Pooling layer and reduces the computational repeatability

of the second stage. We also use atrous convolution for the network ﬁne-tuning. Our network has a competitive result,

and faster than the original two-stage detectors.

Key Words: Object Detection, Deep Learning, Computer Vision

1 INTRODUCTION

In recent years, a great progress has been made in the

ﬁeld of object detection [4] and semantic segmentation [5].

There are two major categories of structures to object de-

tection, respectively, one-stage and two-stage. The two-

stage object detection networks usually propose a set of

candidate boxes in the ﬁrst stage by a RPN(region pro-

posal network) [6] and then perform a ﬁne-tuning on the

candidate boxes in the second stage [6] [7] [8]. This kind

of method usually has higher accuracy but slower speeds.

Semantic segmentation networks are usually using the en-

coder and decoder structures. The contextual relationships

[9] [10] within the picture are often taken into account in

the segmentation networks.

RoI-Pooling is a commonly used structure in two-stage ob-

ject detection architectures [6] [7] [8]. The RoI-Pooling

layer is a set of mattings on the feature maps according to

the candidate boxes proposed in the ﬁrst stage, then zoom

them to the spciﬁed size, such as 7 × 7 [3]. There are

some variations about this, such as RoI-Align [11]. For

large squared candidate boxes, this operation can reduce

a certain amount of calculation. But for small rectangular

object candidate boxes, this operation modiﬁes the space

information of the original small object. Most importantly,

the candidate boxes proposed in the ﬁrst stage overlap so

This work is supported by International Scientiﬁc and Technolog-

ical Cooperation Projects of China under Grant 2015DFG12650, the

Singapore-China Joint Research Programme of the Science and Engineer-

ing Research Council in the Agency for Science, Technology and Re-

search (A*STAR), Singapore, under SERC Project No.1420200047, and

National Nature Science Foundation of China under Grant 61620106012

and 61573048.

*Weihai Chen is the corresponding author.

much that the overall detection rate is slowed down.

Relatively, some one-stage object detection frames [12]

[13] [14] [15] just like RPN, regress the deviation from the

ground truth boxes to the default boxes of different aspect

ratios and different scales at each location around the fea-

ture maps [6]. This operation is fully convolution [1] [2]

and fast, although the precision may be a little lower.

In this paper, we propose a new fully convolution frame-

work for two-stage object detection, called Spatial Align-

ment Network(SAN), which doesn’t use RoI-Pooling. The

ﬁrst step of the detection process is as same as Faster R-

CNN [6]. In the second step, we use convolution again to

regress the deviation between the candidate boxes obtained

in the ﬁrst stage and the ground truth boxes. Fig1(c) illus-

trates the basic procedure of our network. Some parts of

this framework are a bit like R-FCN [8], but our second

stage is handled differently. We don’t use RoI-Pooling or

PS RoI-Pooling. We combine the candidate boxes informa-

tion of the ﬁrst and second stage by sequnce number. We

also use some tricks to combine the outputs of RPN and the

features. We test our model on VOC2007 test set, and get a

test speed of 90ms per image using ResNet-101, with mAP

76.5%. Under the same backbone and hardware condition,

the test speed of our network is 3× than Faster R-CNN,

1.2× faster than F-RCNN.

2 RELATED WORK

2.1 Two-stage Detectors

R-CNN [16] successfully applies convolution to object de-

tection for the ﬁrst time. It uses the selective search al-

gorithm to extract about 2000 region proposals in the im-

age, extracts the features from the image in the region pro-

posals by convolution, and classiﬁes the extracted features

1680

978-1-5386-1243-9/18/$31.00

2018 IEEE

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38643269

粉丝: 2
资源: 902

新型无RoI-Pooling两阶段目标检测网络

40_解释 ROI Pooling 和 ROI Align 的区别1

A Latent Semantic Model with Convolutional-Pooling Structure

Futures-Trading-Strategy-in-asset-allocation-using-Entropy-Pooling

Generalized-pooling-functions-CNN:机器学习

Spatial-Temporal-Pooling-Networks-ReID:徐双杰（2017.https

java8源码-Knowledge-Pooling:知识汇总

Object-Pooling-for-Unity:统一对象池

matlab中代码返回到某一行-Copula-Opinion-Pooling-application-on-yield-curves:Blac

BiLSTM-Generalized-Pooling-pytorch:通过广义池增强句子嵌入

期权matlab代码-Spatial-Temporal-Pooling-Networks-ReID:ICCV2017论文的代码-联合专注的时空

最新资源