ThunderNet: Towards Real-time Generic Object Detection
Zheng Qin
∗†1
, Zeming Li
∗2
, Zhaoning Zhang
1
, Yiping Bao
2
, Gang Yu
2
, Yuxing Peng
1
, Jian Sun
2
1
National University of Defense Technology
2
Megvii Inc. (Face++)
{qinzheng12, zhangzhaoning, pengyuxing}@nudt.edu.cn {lizeming, baoyiping, yugang, sunjian}@megvii.com
Abstract
Real-time generic object detection on mobile platforms
is a crucial but challenging computer vision task. However,
previous CNN-based detectors suffer from enormous com-
putational cost, which hinders them from real-time infer-
ence in computation-constrained scenarios. In this paper,
we investigate the effectiveness of two-stage detectors in
real-time generic detection and propose a lightweight two-
stage detector named ThunderNet. In the backbone part, we
analyze the drawbacks in previous lightweight backbones
and present a lightweight backbone designed for object de-
tection. In the detection part, we exploit an extremely effi-
cient RPN and detection head design. To generate more dis-
criminative feature representation, we design two efficient
architecture blocks, Context Enhancement Module and Spa-
tial Attention Module. At last, we investigate the balance
between the input resolution, the backbone, and the de-
tection head. Compared with lightweight one-stage detec-
tors, ThunderNet achieves superior performance with only
40% of the computational cost on PASCAL VOC and COCO
benchmarks. Without bells and whistles, our model runs at
24.1 fps on an ARM-based device. To the best of our knowl-
edge, this is the first real-time detector reported on ARM
platforms. Code will be released for paper reproduction.
1. Introduction
Real-time generic object detection on mobile devices is a
crucial but challenging task in computer vision. Compared
with server-class GPUs, mobile devices are computation-
constrained and raise more strict restrictions on the com-
putational cost of detectors. However, modern CNN-based
detectors are resource-hungry and require massive compu-
tation to achieve ideal detection accuracy, which hinders
them from real-time inference in mobile scenarios.
From the perspective of network structure, CNN-based
detectors can be divided into the backbone part which ex-
tracts features for the image and the detection part which
∗
Equal contribution.
†
This work was done when Zheng Qin was an intern at Megvii Inc.
200 400 600 800 1000 1200
MFLOPs
16
18
20
22
24
26
28
COCO AP
0.5 : 0.95
24.1 fps (845)
13.8 fps (845)
5.8 fps (845)
5.4 fps (820)
3.7 fps (810)
5 fps (810)
6.7 fps (6700K)
ThunderNet
MobileNetV1-SSD
MobileNetV1-SSDLite
MobileNetV2-SSDLite
Pelee
Tiny-DSOD
Figure 1. Comparison of ThunderNet and previous lightweight
detectors on COCO test-dev
1
. ThunderNet achieves improvements
in both accuracy and efficiency.
detects object instances in the image. In the backbone part,
state-of-the-art detectors are inclined to exploit huge clas-
sification networks (e.g., ResNet-101 [10, 4, 16, 17]) and
large input images (e.g., 800×1200 pixels), which requires
massive computational cost. Recent progress in lightweight
image classification networks [3, 33, 20, 11, 28] has facil-
itated real-time object detection [11, 28, 14, 20] on GPU.
However, there are several differences between image clas-
sification and object detection, e.g., object detection needs
large receptive field and low-level features to improve the
localization ability, which is less crucial for image classifi-
cation. The gap between the two tasks restricts the perfor-
mance of these backbones on object detection and obstructs
further compression without harming detection accuracy.
In the detection part, CNN-based detectors can be cat-
egorized into two-stage detectors [27, 4, 16, 14] and one-
stage detectors [24, 19, 25, 17]. For two-stage detectors, the
detection part usually consists of Region Proposal Network
(RPN) [27] and the detection head (including RoI warping
and R-CNN subnet). RPN first generates RoIs, and then the
1
Speed is evaluated with a single thread on CPU: MobileNet-SSD on
Snapdragon 820, MobileNet/MobileNetV2-SSDLite on Snapdragon 810,
Pelee on Intel i7-6700K (4.0 GHz), and ThunderNet on Snapdragon 845.
1
arXiv:1903.11752v1 [cs.CV] 28 Mar 2019