TridentNet: 解决目标检测中小目标难题的多分支网络

深度学习

小目标检测

需积分: 50 79 浏览量更新于2024-09-07 收藏 2.12MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

资源详情

资源推荐

Scale-Aware Trident Networks for Object Detection

Yanghao Li* Yuntao Chen

1,3

* Naiyan Wang

Zhaoxiang Zhang

1,3,4

University of Chinese Academy of Sciences

TuSimple

Center for Research on Intelligent Perception and Computing, CASIA

Center for Excellence in Brain Science and Intelligence Technology, CAS

lyttonhao@gmail.com chenyuntao2016@ia.ac.cn zhaoxiang.zhang@ia.ac.cn winsty@gmail.com

Abstract

Scale variation is one of the key challenges in object de-

tection. In this work, we ﬁrst present a controlled experi-

ment to investigate the effect of receptive ﬁelds on the detec-

tion of different scale objects. Based on the ﬁndings from the

exploration experiments, we propose a novel Trident Net-

work (TridentNet) aiming to generate scale-speciﬁc feature

maps with a uniform representational power. We construct

a parallel multi-branch architecture in which each branch

shares the same transformation parameters but with differ-

ent receptive ﬁelds. Then, we propose a scale-aware train-

ing scheme to specialize each branch by sampling object

instances of proper scales for training. As a bonus, a fast

approximation version of TridentNet could achieve signiﬁ-

cant improvements without any additional parameters and

computational cost. On the COCO dataset, our TridentNet

with ResNet-101 backbone achieves state-of-the-art single-

model results by obtaining an mAP of 48.4. Code will be

made publicly available.

1. Introduction

In recent years, deep convolutional neural networks

(CNNs) [17, 37, 30] have achieved great success in ob-

ject detection. Typically, these CNN-based methods can be

roughly divided into two types: one stage methods such

as YOLO [34] or SSD [30] which directly utilizes feed-

forward CNN to predict the bounding boxes of interest,

while two stage methods such as Faster R-CNN [37] or

R-FCN [10] ﬁrst generate proposals, and then exploit the

extracted region features from CNN for further reﬁnement.

However, a central issue in both methods lies in handling

scale variation. It is very common that the scale of object

instances varies in a wide range, which impedes the detec-

tors, especially for very small or very large objects.

* Equal Contribution

To remedy the scale variation issue, an intuitive way is

to leverage multi-scale image pyramids [1], which is pop-

ular in both hand-crafted feature based methods [12, 31]

and current deep CNN based methods (Figure 1(a)). Strong

evidence [22, 29] shows that current standard deep detec-

tors [37, 10] could beneﬁt from multi-scale training and

testing. To avoid training objects with extreme scales

(small/large objects in smaller/larger scales), SNIP [40, 41]

proposes a scale normalization method that selectively

trains the objects of appropriate sizes in each image scale.

Nevertheless, the increase of inference time makes the im-

age pyramid methods infeasible for practical applications.

The other line of efforts aims to employ in-network fea-

ture pyramids to approximate image pyramids with less

computation cost. The idea is ﬁrst demonstrated in [13],

where a fast feature pyramid is constructed for object de-

tection by interpolating some feature channels from nearby

scale levels. In the deep learning era, the approximation

is even easier. SSD [30] utilizes multi-scale feature maps

from different layers and detects objects of different scales

at each feature layer. To compensate the absence of seman-

tics in low-level features, FPN [26] (Figure 1(b)) further

augments a top-down pathway and lateral connections to

incorporate strong semantic information in high-level fea-

tures. However, the representational power for objects of

different scales still differ, since their features are extracted

on different layers in FPN. This makes feature pyramids an

unsatisfactory alternative for image pyramids.

Both image pyramid and feature pyramid methods share

the same motivation that models should have different re-

ceptive ﬁelds for objects of different scales. Despite of

the inefﬁciency, image pyramids fully utilize the representa-

tional power of the model to transform objects of all scales

equally. In contrast, feature pyramids generate multi-level

features thus sacriﬁcing the feature consistency across dif-

ferent scales. The goal of this work is to get the best of two

worlds by creating features with a uniform representational

power for all scales efﬁciently.

In this paper, instead of feeding in multi-scale inputs

arXiv:1901.01892v1 [cs.CV] 7 Jan 2019

下载后可阅读完整内容，剩余9页未读，立即下载

nihate

粉丝: 1728
资源: 24

TridentNet: 解决目标检测中小目标难题的多分支网络

Cascade R-CNN.pdf、CornerNet.pdf、RetinaNet.pdf、TridentNet.pdf、YOLOv3.pdf

simpledet

使用TridentNet构建检测模型

opencv 特征算法有哪些

SCALE-AWARE

目标检测算法发展综述

org.tron.trident

使用Trident内核的浏览器有什么

trident-java

用Trident绘制html到hdc上

trident-java怎么实现trc20转账

优化这句代码 http.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)");

User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; Tablet PC 2.0)

php判断浏览器内核,浏览器内核是什么

推荐40个以上比较好的目标检测模型

user agent大全

最新资源