没有合适的资源?快使用搜索试试~ 我知道了~
首页TridentNet: 解决目标检测中小目标难题的多分支网络
本文主要探讨了深度学习领域中的一个重要挑战——小目标检测问题。标题"TridentNet"提出了一种创新的解决方案,旨在通过解决尺度变化对物体检测性能的影响来提升检测率。作者首先进行了一项控制实验,深入研究了不同尺度对象检测时,感受野(receptive field)的作用。实验结果揭示了感受野大小对小目标检测的重要性。 基于这些发现,TridentNet设计了一个并行多分支架构,每个分支共享相同的变换参数,但具有不同的感受野。这种设计允许网络在同一组参数下处理不同尺度的特征,从而保持了统一的表征能力。通过这种方式,网络能够更有效地捕捉到小目标的细节信息,提高检测的精确度。 此外,文章还提出了一种尺度感知的训练策略。该策略通过选择合适尺度的对象实例进行训练,使得每个分支能够专长于特定的尺度范围。这样,网络不仅提高了对小目标的识别能力,还能在处理各种尺度的对象时保持良好的泛化性能。 值得一提的是,为了进一步提升效率,文中还提及了一种快速近似版本的TridentNet,这可能是通过优化算法或者硬件加速技术实现的,能够在保持高精度的同时,降低计算负担,适应实际应用的需求。 TridentNet通过对感受野的精细调控和尺度敏感的训练方法,为深度学习中的小目标检测提供了一种有效且高效的方法,对于提高整体物体检测系统的性能具有显著的推动作用。这项工作不仅提升了小目标检测的准确性,也展示了深度学习在处理复杂场景下的实用性与潜力。
资源详情
资源推荐
Scale-Aware Trident Networks for Object Detection
Yanghao Li* Yuntao Chen
1,3
* Naiyan Wang
2
Zhaoxiang Zhang
1,3,4
1
University of Chinese Academy of Sciences
2
TuSimple
3
Center for Research on Intelligent Perception and Computing, CASIA
4
Center for Excellence in Brain Science and Intelligence Technology, CAS
lyttonhao@gmail.com chenyuntao2016@ia.ac.cn zhaoxiang.zhang@ia.ac.cn winsty@gmail.com
Abstract
Scale variation is one of the key challenges in object de-
tection. In this work, we first present a controlled experi-
ment to investigate the effect of receptive fields on the detec-
tion of different scale objects. Based on the findings from the
exploration experiments, we propose a novel Trident Net-
work (TridentNet) aiming to generate scale-specific feature
maps with a uniform representational power. We construct
a parallel multi-branch architecture in which each branch
shares the same transformation parameters but with differ-
ent receptive fields. Then, we propose a scale-aware train-
ing scheme to specialize each branch by sampling object
instances of proper scales for training. As a bonus, a fast
approximation version of TridentNet could achieve signifi-
cant improvements without any additional parameters and
computational cost. On the COCO dataset, our TridentNet
with ResNet-101 backbone achieves state-of-the-art single-
model results by obtaining an mAP of 48.4. Code will be
made publicly available.
1. Introduction
In recent years, deep convolutional neural networks
(CNNs) [17, 37, 30] have achieved great success in ob-
ject detection. Typically, these CNN-based methods can be
roughly divided into two types: one stage methods such
as YOLO [34] or SSD [30] which directly utilizes feed-
forward CNN to predict the bounding boxes of interest,
while two stage methods such as Faster R-CNN [37] or
R-FCN [10] first generate proposals, and then exploit the
extracted region features from CNN for further refinement.
However, a central issue in both methods lies in handling
scale variation. It is very common that the scale of object
instances varies in a wide range, which impedes the detec-
tors, especially for very small or very large objects.
* Equal Contribution
To remedy the scale variation issue, an intuitive way is
to leverage multi-scale image pyramids [1], which is pop-
ular in both hand-crafted feature based methods [12, 31]
and current deep CNN based methods (Figure 1(a)). Strong
evidence [22, 29] shows that current standard deep detec-
tors [37, 10] could benefit from multi-scale training and
testing. To avoid training objects with extreme scales
(small/large objects in smaller/larger scales), SNIP [40, 41]
proposes a scale normalization method that selectively
trains the objects of appropriate sizes in each image scale.
Nevertheless, the increase of inference time makes the im-
age pyramid methods infeasible for practical applications.
The other line of efforts aims to employ in-network fea-
ture pyramids to approximate image pyramids with less
computation cost. The idea is first demonstrated in [13],
where a fast feature pyramid is constructed for object de-
tection by interpolating some feature channels from nearby
scale levels. In the deep learning era, the approximation
is even easier. SSD [30] utilizes multi-scale feature maps
from different layers and detects objects of different scales
at each feature layer. To compensate the absence of seman-
tics in low-level features, FPN [26] (Figure 1(b)) further
augments a top-down pathway and lateral connections to
incorporate strong semantic information in high-level fea-
tures. However, the representational power for objects of
different scales still differ, since their features are extracted
on different layers in FPN. This makes feature pyramids an
unsatisfactory alternative for image pyramids.
Both image pyramid and feature pyramid methods share
the same motivation that models should have different re-
ceptive fields for objects of different scales. Despite of
the inefficiency, image pyramids fully utilize the representa-
tional power of the model to transform objects of all scales
equally. In contrast, feature pyramids generate multi-level
features thus sacrificing the feature consistency across dif-
ferent scales. The goal of this work is to get the best of two
worlds by creating features with a uniform representational
power for all scales efficiently.
In this paper, instead of feeding in multi-scale inputs
1
arXiv:1901.01892v1 [cs.CV] 7 Jan 2019
下载后可阅读完整内容,剩余9页未读,立即下载
nihate
- 粉丝: 1728
- 资源: 24
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 李兴华Java基础教程:从入门到精通
- U盘与硬盘启动安装教程:从菜鸟到专家
- C++面试宝典:动态内存管理与继承解析
- C++ STL源码深度解析:专家级剖析与关键技术
- C/C++调用DOS命令实战指南
- 神经网络补偿的多传感器航迹融合技术
- GIS中的大地坐标系与椭球体解析
- 海思Hi3515 H.264编解码处理器用户手册
- Oracle基础练习题与解答
- 谷歌地球3D建筑筛选新流程详解
- CFO与CIO携手:数据管理与企业增值的战略
- Eclipse IDE基础教程:从入门到精通
- Shell脚本专家宝典:全面学习与资源指南
- Tomcat安装指南:附带JDK配置步骤
- NA3003A电子水准仪数据格式解析与转换研究
- 自动化专业英语词汇精华:必备术语集锦
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功