Real Time Human Detection by Unmanned Aerial
Vehicles
Walid Guettala
Computer Science Department,
Biskra University, Algeria
walidguettala@gmail.com
Ali Sayah
Computer Science Department,
Biskra University, Algeria
Sayah.Ali@hotmail.com
Laid Kahloul
LINFI Laboratory, Computer Science Department,
Biskra University, Algeria
l.kahloul@univ-biskra.dz
Ahmed Tibermacine
LESIA laboratory, Computer Science Department,
Biskra University, Algeria
ahmed.tibermacine@univ-biskra.dz
Abstract—One of the most important problems in computer
vision and remote sensing is object detection, which identifies
particular categories of diverse things in pictures. Two crucial
data sources for public security are the thermal infrared (TIR)
remote sensing multi-scenario photos and videos produced by
unmanned aerial vehicles (UAVs). Due to the small scale of the
target, complex scene information, low resolution relative to the
viewable videos, and dearth of publicly available labeled datasets
and training models, their object detection procedure is still
difficult. A UAV TIR object detection framework for pictures and
videos is suggested in this study. The Forward-looking Infrared
(FLIR) cameras used to gather ground-based TIR photos and
videos are used to create the “You Only Look Once” (YOLO)
model, which is based on CNN architecture. Results indicated
that in the validating task, detecting human object had an
average precision at IOU (Intersection over Union) = 0.5, which
was 72.5%, using YOLOv7 (YOLO version 7) state of the art
model [1], while the detection speed around 161 frames per
second (FPS/second). The usefulness of the YOLO architecture
is demonstrated in the application, which evaluates the cross-
detection performance of people in UAV TIR videos under a
YOLOv7 model in terms of the various UAVs’ observation angles.
The qualitative and quantitative evaluation of object detection
from TIR pictures and videos using deep-learning models is
supported favorably by this work.
Index Terms—Human detection — Human tracking — Ther-
mal Imaging — YOLOv7 — UAV
I. INTRODUCTION
Unmanned aerial vehicle (UAV) object detection is devel-
oping technology with a wide range of uses, including aerial
picture analysis, intelligent surveillance, and route inspection
[2, 3]. Recently, there has been a lot of advancement in object
detection. The deep neural network (DNN), in particular the
convolutional neural network (CNN) [4], has shown record
breaking performance in computer vision applications like ob-
ject recognition [5], especially with the introduction of large-
scale visual datasets and greater computing power. However,
given the unique perspective, it is still a difficult task.
“Deep-learning-based object detection” [6] and “conven-
tional manual feature-based object detection” [7] are two
different approaches to object detection. It focuses on the
target-feature extraction technique design for manual feature-
based object detection, but because it is still difficult to meet
various constraints, most of these sorts of approaches are only
employed in certain environments [8]. On the other hand,
deep-learning-based techniques may now achieve real-time
detection in addition to improving accuracy as computing
technology advances.
Although deep-learning-based techniques have significantly
advanced object recognition, miss-detection problems still
exist in UAV. The following are the key contributing factors
to these problems: (1) The network’s receptive field is not
sufficiently resistant to small objects and Thermal Imaging,
and (2) the training dataset is confined to UAV viewpoint.
In general, object feature representation and the associated
training dataset are crucial for enhancing object detection
performance. Additionally, the trade-off between accuracy and
processing speed is crucial for real-world applications.
We are motivated by these issues to create an object
detection technique based on the You Only Look principle
(YOLO) [9], and we focus on the detection of tiny objects. To
enhance detection performance on tiny objects, we gather data
based on UAV views, and we enhance the YOLOv7 network.
to our dataset by transfer learning. The following are some
of our study’s contributions: (1) develop a UAV perspective-
based dataset for person detection that may be used to enhance
human detection; (2) enhance YOLO’s network architecture
to expand the receptive area and further improve tiny human
detecting performance using transfer learning.
The remaining of this article is structured as follows: The
related work is introduced in Section 2, the experimental
setup is further explained in Section 3, and presents the
experimental findings and talks about detailed comparative
analysis. Concluding observations are included in Section 4.
II. RELATED WORK
In the literature, considerable numbers of works has been
introduced to handle the challenging tasks of object detection.
This section will briefly discuss novel approaches and meth-
ods.
arXiv:2401.03275v1 [cs.CV] 6 Jan 2024