A CONTRARIO PARADIGM FOR YOLO-BASED INFRARED SMALL TARGET DETECTION
Alina Ciocarlan
1,2
, Sylvie Le Hegarat-Mascle
2
, Sidonie Lefebvre
1
, Arnaud Woiselle
3
, Clara Barbanson
3
1
DOTA and LMA2S, ONERA, Universit
´
e Paris-Saclay, F-91123 Palaiseau, France
2
SATIE Universit
´
e Paris-Saclay, 91405 Orsay, France
3
Safran Electronics & Defense F-91344 Massy, France
ABSTRACT
Detecting small to tiny targets in infrared images is a chal-
lenging task in computer vision, especially when it comes
to differentiating these targets from noisy or textured back-
grounds. Traditional object detection methods such as YOLO
struggle to detect tiny objects compared to segmentation neu-
ral networks, resulting in weaker performance when detect-
ing small targets. To reduce the number of false alarms while
maintaining a high detection rate, we introduce an a contrario
decision criterion into the training of a YOLO detector. The
latter takes advantage of the unexpectedness of small targets
to discriminate them from complex backgrounds. Adding
this statistical criterion to a YOLOv7-tiny bridges the per-
formance gap between state-of-the-art segmentation methods
for infrared small target detection and object detection net-
works. It also significantly increases the robustness of YOLO
towards few-shot settings.
Index Terms— small target detection, a contrario rea-
soning, YOLO, few-shot detection
1. INTRODUCTION
Detecting small objects in infrared (IR) images accurately
is essential in various applications, including medical or se-
curity fields. Infrared small target detection (IRSTD) is a
great challenge in computer vision, where the difficulties are
mainly raised by (i) the size of the targets (area below 20
pixels), (ii) the complex and highly textured backgrounds,
leading to many false alarms, and (iii) the learning condi-
tions, namely learning from small, little diversified and highly
class-imbalanced datasets, since the number of target class
pixels is very small in comparison with the background class
one. The rise of deep learning methods has led to impres-
sive advances in object detection in the past decades, mostly
thanks to their ability to learn from a huge amount of anno-
tated data to extract non-linear features well adapted to the
final task. For IRSTD, semantic segmentation neural net-
works (NN) are the most widely used [1]. These include
ACM [2], LSPM [3] and one of the recent state-of-the-art
(SOTA) method, namely DNANet [4], which consists of sev-
eral nested UNets and a multiscale fusion module that enable
the segmentation of small objects with variable sizes. How-
ever, a major issue of relying on segmentation NN for object
detection is that object fragmentation can occur when tuning
the threshold used to binarize the segmentation map. This
can lead to many undesired false alarms and distort counting
metrics. Object detection algorithms like Faster-RCNN [5] or
YOLO [6] reduce this risk by explicitly localizing the objects
thanks to the bounding box regression. However, they often
have difficulty in detecting tiny objects. Very few studies have
focused on adapting such detectors for IRSTD [7], and no rig-
orous comparison was made with SOTA IRSTD methods.
In this paper, we propose a novel YOLO detection
head, called OL-NFA (for Object-Level Number of False
Alarms), that is specifically designed for small object detec-
tion. This module integrates an a contrario decision criterion
that guides the feature extraction so that unexpected objects
stand out against the background and are detected. It is used
to re-estimate the objectness scores computed by a YOLO
backbone, and has been carefully implemented to allow the
back-propagation during training. One advantage of using a
contrario paradigm is that it focuses on modeling the back-
ground, for which we have a lot of samples, rather than the
objects themselves. In this way, the problems of class im-
balance and little training data are bypassed by carrying out
the detection by rejecting the hypothesis of the background
distribution. Our main contributions are as follows:
1. We design a novel YOLO detection head that inte-
grates a contrario criterion for estimating the object-
ness scores. By focusing on modeling the background
rather than the object itself, we relax the constraint of
having lots of training samples.
2. We compare both SOTA segmentation NN and object
detection methods on a famous IRSTD benchmark and
show that adding OL-NFA to a YOLOv7-tiny backbone
bridges the performance gap between object detectors
and SOTA segmentation NN for IRSTD.
3. On top of that, our method improves YOLOv7-tiny per-
formance by a large margin (39.2% AP for 15-shot) in
few-shot settings, demonstrating the robustness of the
a contrario paradigm in weak training conditions.
arXiv:2402.02288v1 [cs.CV] 3 Feb 2024