2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM)
Real-time Vehicle Detection from UAV Imagery
Xuemei Xie*, Wenzhe Yang, Guimei Cao, Jianxiu Yang, Zhifu Zhao, Shu Chen, Quan Liao, Guangming Shi
School of Artificial Intelligence
Xidian University, Xi’an, China 710071
xmxie@mail.xidian.edu.cn
Abstract—Fast and accurate vehicle detection in unmanned
aerial vehicle (UAV) imagery is a meaningful but challenging task,
playing an important role in a wide range of applications. Due to
its tiny size, few features, variable scales and imbalance vehicle
sample problems in UAV imagery, current deep learning methods
used in this task cannot achieve a satisfactory performance both
in accuracy and speed, which is obvious a classical trade-off
problem. In this paper, we propose a single-shot vehicle detector,
which focuses on accurate and real-time vehicle detection in UAV
imagery. We make contributions in the following two aspects:
1) presenting a multi-scale feature fusion module to combine
the high resolution but semantically weak features with the low
resolution but semantically strong features, aiming to introduce
context information to enhance the feature representation of
the small vehicles; 2) proposing a dynamic training strategy
(DTS) which constructs the network to learn more discriminative
features of hard examples, via using cross entropy and focal loss
function alternately. Experimental results show that our method
can achieve 90.8% accuracy in UAV images and can run at
59 FPS on a single NVIDIA 1080Ti GPU for the small vehicle
detection in UAV images.
Index Terms—vehicle detection, unmanned aerial vehicle im-
agery, feature fusion, dynamic training strategy
I. INTRODUCTION
Nowadays, vehicle detection in unmanned aerial vehicles
(UAV) imagery plays a significant role for a wide range of
applications [1]–[3]. However, there are some negative char-
acteristics in real-time vehicle detection from UAV imagery,
tiny objects, various orientation of the targets, and imbalance
samples, which lead to unsatisfactory performance both in
speed and accuracy.
Traditional methods are mainly based on the handcrafted
features [4], [5] and sliding window search algorithms [6], [7].
The handcrafted features cannot extract good semantic repre-
sentation. Some following studies [8], [9] exploit deep learning
methods to improve the feature representation capability com-
pared with handcrafted ones, bringing certain improvement
in detection accuracy. But there is still a gap to real-time
detection. Faster R-CNN [10], one of CNN-based detectors,
has achieved a good performance in UAV imagery [11]–
[14]. While, it has a limitation in speed due to its detection
mechanism. Subsequently, YOLOs [15], [16] are employed to
achieve real-time detection with lower accurate [17]. Due to
the wide range of view of UAV images, the vehicle objects
*This work is supported by Natural Science Foundation (NSF) of China
(Nos.61472301, 61632019), the Foundation for Innovative Research Groups of
the National Natural Science Foundation of China (No. 61621005), Ministry
of Education project (No. 6141A02011601).
are usually small, occluded and with complex background. In
the context of the situations, accurately detecting the vehicles
from UAV imagery is quite difficult.
In this paper, we propose a single shot network using multi-
level feature fusion method which utilizes context information
efficiently and effectively, make a certain progress in accu-
racy and achieve real-time vehicle detection simultaneously.
Moreover, the extremely hard-easy class imbalance in UAV
dataset causes two problems as follows: 1) model training
is insufficient for the categories which with a small amount
of examples, so that it is hard for the network to extract
representative features [18], [19]; 2) most easy samples will
overwhelm the total loss and gradients computation so the
network cannot learn the discriminative features well [20]. To
solve these, we design a dynamic training strategy (DTS)
to solve the imbalance problem and improve the network
detection performance.
To summarize, we present a single-shot detector, which
focuses on accurate and real-time vehicle detection from UAV
imagery. Specifically, our main contributions are as follows:
• We present a multi-scale feature fusion module to com-
bine the high resolution but semantically weak features
with the low resolution but semantically strong features,
which aims to introduce context information to enhance
feature representation of the small vehicles;
• We propose a dynamic training strategy (DTS) which
instruct the network to learn more discriminative features
of hard examples, via using cross entropy and focal loss
function alternately;
Experimental results show that our method can achieve
90.8% accuracy which is 7.5% and 3.1% higher than SSD
[21] and RefineDet [22] respectively in UAV images. And the
proposed network can run at 59 FPS on a single NVIDIA
1080Ti GPU for the small vehicle detection.
II. RELATED WORK
A. UAV Vehicle Detector
Vehicle detection from UAV imagery has attracted extensive
research attention in past years. Moranduzzo et al. [23], Shao
et al. [4] and Kembhavi et al. [6] explore the vehicle detection
by using handcrafted features (e.g., Haar, HOG, SIFT, local
binary pattern, etc.) and intersection kernel SVM, which
make some progress. Xu et al. [14] improves original Viola-
Jones object detection scheme for better performance from
low-altitude UAV imagery. However, traditional handcrafted
978-1-5386-5321-0/18/$31.00 ©2018 IEEE