没有合适的资源?快使用搜索试试~ 我知道了~
首页目标检测中的不平衡问题:综述论文(TPAMI 2020).pdf
本文介绍了一篇关于目标检测中不平衡的综述论文:Imbalance Problems in Object Detection: A Review (https://arxiv.org/abs/1909.00169, under review at TPAMI),作者结合自己最近在这方面的 Tech Report: Is Sampling Heuristics Necessary in Training Object Detectors? (https://arxiv.org/abs/1909.04868) 进行一些阐述和思考,希望可以给大家以启发。
资源详情
资源评论
资源推荐

OKSUZ et al.: IMBALANCE PROBLEMS IN OBJECT DETECTION: A REVIEW 1
Imbalance Problems in Object Detection: A
Review
Kemal Oksuz
†
, Baris Can Cam , Sinan Kalkan
‡
, and Emre Akbas
‡
Abstract—In this paper, we present a comprehensive review of the imbalance problems in object detection. To analyze the problems in
a systematic manner, we introduce a problem-based taxonomy. Following this taxonomy, we discuss each problem in depth and
present a unifying yet critical perspective on the solutions in the literature. In addition, we identify major open issues regarding the
existing imbalance problems as well as imbalance problems that have not been discussed before. Moreover, in order to keep our
review up to date, we provide an accompanying webpage which catalogs papers addressing imbalance problems, according to our
problem-based taxonomy. Researchers can track newer studies on this webpage available at:
https://github.com/kemaloksuz/ObjectDetectionImbalance.
F
1 INTRODUCTION
Object detection is the simultaneous estimation of categories
and locations of object instances in a given image. It is a fun-
damental problem in computer vision with many important
applications in e.g. surveillance [1], [2], autonomous driving
[3], [4], medical decision making [5], [6], and many problems
in robotics [7], [8], [9], [10], [11], [12].
Since the time when object detection (OD) was cast as
a machine learning problem, the first generation OD meth-
ods relied on hand-crafted features and linear, max-margin
classifiers. The most successful and representative method
in this generation was the Deformable Parts Model (DPM)
[13]. After the extremely influential work by Krizhevsky et
al. in 2012 [14], deep learning (or deep neural networks) has
started to dominate various problems in computer vision
and OD was no exception. The current generation OD
methods are all based on deep learning where both the
hand-crafted features and linear classifiers of the first gener-
ation methods have been replaced by deep neural networks.
This replacement has brought significant improvements in
performance: On a widely used OD benchmark dataset
(PASCAL VOC), while the DPM [13] achieved 0.34 mean
average-precision (mAP), current deep learning based OD
models achieve around 0.80 mAP [15].
In the last five years, although the major driving force of
progress in OD has been the incorporation of deep neural
networks [16], [17], [18], [19], [20], [21], [22], [23], imbalance
problems in OD at several levels have also received signif-
icant attention [24], [25], [26], [27], [28], [29], [30]. An im-
balance problem with respect to an input property occurs
when the distribution regarding that property affects the
performance. When not addressed, an imbalance problem
has adverse effects on the final detection performance. For
example, the most commonly known imbalance problem
in OD is the foreground-to-background imbalance which
All authors are at the Dept. of Computer Engineering, Middle East Techni-
cal University (METU), Ankara, Turkey. E-mail:{kemal.oksuz@metu.edu.tr,
can.cam@metu.edu.tr, skalkan@metu.edu.tr, emre@ceng.metu.edu.tr}
†
Corresponding author.
‡
Equal contribution for senior authorship.
manifests itself in the extreme inequality between the num-
ber of positive examples versus the number of negatives.
In a given image, while there are typically a few positive
examples, one can extract millions of negative examples.
If not addressed, this imbalance greatly impairs detection
accuracy.
In this paper, we review the deep-learning-era object
detection literature and identify eight different imbalance
problems. We group these problems in a taxonomy with
four main types: class imbalance, scale imbalance, spatial
imbalance and objective imbalance (Table 1). Class imbal-
ance occurs when there is significant inequality among
the number of examples pertaining to different classes.
While the classical example of this is the foreground-to-
background imbalance, there is also imbalance among the
foreground (positive) classes. Scale imbalance occurs when
the objects have various scales and different numbers of
examples pertaining to different scales. Spatial imbalance
refers to a set of factors related to spatial properties of the
bounding boxes such as regression penalty, location and
IoU. Finally, objective imbalance occurs when there are
multiple loss functions to minimize, as is often the case in
OD (e.g. classification and regression losses).
1.1 Scope and Aim
Imbalance problems in general have a large scope in ma-
chine learning, computer vision and pattern recognition.
We limit the focus of this paper to imbalance problems in
object detection. Since the current state-of-the-art is shaped
by deep learning based approaches, the problems and ap-
proaches that we discuss in this paper are related to deep
object detectors. Although we restrict our attention to object
detection in still images, we provide brief discussions on
similarities and differences of imbalance problems in other
domains. We believe that these discussions would provide
insights on future research directions for object detection
researchers.
Presenting a comprehensive background for object de-
tection is not among the goals of this paper; however, some
arXiv:1909.00169v3 [cs.CV] 11 Mar 2020

OKSUZ et al.: IMBALANCE PROBLEMS IN OBJECT DETECTION: A REVIEW 2
TABLE 1: Imbalance problems reviewed in this paper. We state that an imbalance problem with respect to an input property
occurs when the distribution regarding that property affects the performance. The first column shows the major imbalance
categories. For each Imbalance problem given in the middle column, the last column shows the associated input property
concerning the definition of the imbalance problem.
Type Imbalance Problem Related Input Property
Class
Foreground-Background Class Imbalance (§4.1)
The numbers of input bounding boxes pertain-
ing to different classes
Foreground-Foreground Class Imbalance (§4.2)
Scale
Object/box-level Scale Imbalance (§5.1) The scales of input and ground-truth bounding
boxes
Feature-level Imbalance (§5.2) Contribution of the feature layer from different
abstraction levels of the backbone network (i.e.
high and low level)
Spatial
Imbalance in Regression Loss (§6.1) Contribution of the individual examples to the
regression loss
IoU Distribution Imbalance (§6.2) IoU distribution of positive input bounding
boxes
Object Location Imbalance (§6.3) Locations of the objects throughout the image
Objective Objective Imbalance (§7) Contribution of different tasks (i.e. classifica-
tion, regression) to the overall loss
background knowledge on object detection is required to
make the most out of this paper. For a thorough background
on the subject, we refer the readers to the recent, compre-
hensive object detection surveys [31], [32], [33]. We provide
only a brief background on state-of-the-art object detection
in Section 2.1.
Our main aim in this paper is to present and discuss
imbalance problems in object detection comprehensively. In
order to do that,
1) We identify and define imbalance problems and
propose a taxonomy for studying the problems and
their solutions.
2) We present a critical literature review for the exist-
ing studies with a motivation to unify them in a sys-
tematic manner. The general outline of our review
includes a definition of the problems, a summary
of the main approaches, an in-depth coverage of
the specific solutions, and comparative summaries
of the solutions.
3) We present and discuss open issues at the problem-
level and in general.
4) We also reserved a section for imbalance problems
found in domains other than object detection. This
section is generated with meticulous examination of
methods considering their adaptability to the object
detection pipeline.
5) Finally, we provide an accompanying webpage
1
as
a living repository of papers addressing imbalance
problems, organized based on our problem-based
taxonomy. This webpage will be continuously up-
dated with new studies.
1.2 Comparison with Previous Reviews
Recent object detection surveys [31], [32], [33] aim to present
advances in deep learning based generic object detection.
To this end, these surveys propose a taxonomy for object
1. https://github.com/kemaloksuz/ObjectDetectionImbalance
detection methods, and present a detailed analysis of some
cornerstone methods that have had high impact. They also
provide discussions on popular datasets and evaluation
metrics. From the imbalance point of view, these surveys
only consider the class imbalance problem with a limited
provision. Additionally, Zou et al. [32] provide a review for
methods that handle scale imbalance. Unlike these surveys,
here we focus on a classification of imbalance problems
related to object detection and present a comprehensive
review of methods that handle these imbalance problems.
There are also surveys on category specific object de-
tection (e.g. pedestrian detection, vehicle detection, face
detection) [34], [35], [36], [37]. Although Zehang Sun et
al. [34] and Dollar et al. [35] cover the methods proposed
before the current deep learning era, they are beneficial
from the imbalance point of view since they present a
comprehensive analysis of feature extraction methods that
handle scale imbalance. Zafeiriou et al. [36] and Yin et al.
[38] propose comparative analyses of non-deep and deep
methods. Litjens et al. [39] discuss applications of various
deep neural network based methods i.e. classification, detec-
tion, segmentation to medical image analysis. They present
challenges with their possible solutions which include a
limited exploration of the class imbalance problem. These
category specific object detector reviews focus on a single
class and do not consider the imbalance problems in a
comprehensive manner from the generic object detection
perspective.
Another set of relevant work includes the studies specif-
ically for imbalance problems in machine learning [40], [41],
[42], [43]. These studies are limited to the foreground class
imbalance problem in our context (i.e. there is no back-
ground class). Generally, they cover dataset-level methods
such as undersampling and oversampling, and algorithm-
level methods including feature selection, kernel modifi-
cations and weighted approaches. We identify three main
differences of our work compared to such studies. Firstly,
the main scope of such work is the classification problem,

OKSUZ et al.: IMBALANCE PROBLEMS IN OBJECT DETECTION: A REVIEW 3
which is still relevant for object detection; however, object
detection also has a “search” aspect, in addition to the
recognition aspect, which brings in the background (i.e.
negative) class into the picture. Secondly, except Johnson
et al. [43], they consider machine learning approaches in
general without any special focus on deep learning based
methods. Finally, and more importantly, these works only
consider foreground class imbalance problem, which is only
one of eight different imbalance problems that we present
and discuss here (Table 1).
1.3 A Guide to Reading This Review
The paper is organized as follows. Section 2 provides a brief
background on object detection, and the list of frequently-
used terms and notation used throughout the paper. Section
3 presents our taxonomy of imbalance problems. Sections 4-
7 then cover each imbalance problem in detail, with a critical
review of the proposed solutions and include open issues for
each imbalance problem. Each section dedicated to a specific
imbalance problem is designed to be self-readable, contain-
ing definitions and a review of the proposed methods. In
order to provide a more general perspective, in Section 8,
we present the solutions addressing imbalance in other but
closely related domains. Section 9 discusses open issues that
are relevant to all imbalance problems. Finally, Section 10
concludes the paper.
Readers who are familiar with the current state-of-the-art
object detection methods can directly jump to Section 3 and
use Figure 1 to navigate both the imbalance problems and
the sections dedicated to the different problems according to
the taxonomy. For readers who lack a background in state-
of-the-art object detection, we recommend starting with
Section 2.1, and if this brief background is not sufficient,
we refer the reader to the more in-depth reviews mentioned
in Section 1.1.
2 BACKGROUND, DEFINITIONS AND NOTATION
In the following, we first provide a brief background on
state-of-the-art object detection methods, and then present
the definitions and notations used throughout the paper.
2.1 State of the Art in Object Detection
Today there are two major approaches to object detection:
top-down and bottom-up. Although both the top-down
and bottom-up approaches were popular prior to the deep
learning era, today the majority of the object detection meth-
ods follow the top-down approach; the bottom-up methods
have been proposed relatively recently. The main difference
between the top-down and bottom-up approaches is that,
in the top-down approach, holistic object hypotheses (i.e.,
anchors, regions-of-interests/proposals) are generated and
evaluated early in the detection pipeline, whereas in the
bottom-up approach, holistic objects emerge by grouping
sub-object entities like keypoints or parts, later in the pro-
cessing pipeline.
Methods following the top-down approach are catego-
rized into two: two-stage and one-stage methods. Two-
stage methods [16], [17], [18], [21] aim to decrease the large
number of negative examples resulting from the predefined,
TABLE 2: Frequently used notations in the paper.
Symbol
Domain Denotes
B See.Def.
A Bounding box
C
A set of
integers
Set of Class labels in a dataset
|C
i
| |C
i
| ∈ Z
+
Number of examples for the ith
class in a dataset
Ci i ∈ Z
+
Backbone feature layer at depth i
I(P ) I(P ) ∈ {0, 1}
Indicator function. 1 if predicate P
is true, else 0
P i i ∈ Z
+
Pyramidal feature layer
corresponding to ith backbone
feature layer
p
i
p
i
∈ [0, 1]
Confidence score of ith class (i.e.
output of classifier)
p
s
p
s
∈ [0, 1]
Confidence score of the ground
truth class
p
0
p
0
∈ [0, 1]
Confidence score of background
class
u u ∈ C
Class label of a ground truth
ˆx ˆx ∈ R
Input of the regression loss
dense sliding windows, called anchors, to a manageable
size by using a proposal mechanism [21], [44], [45] which
determines the regions where the objects most likely appear,
called Region of Interests (RoIs). These RoIs are further
processed by a detection network which outputs the object
detection results in the form of bounding boxes and associ-
ated object-category probabilities. Finally, the non-maxima
suppression (NMS) method is applied on the object de-
tection results to eliminate duplicate or highly-overlapping
results. NMS is a universal post-processing step used by all
state-of-the-art object detectors.
One-stage top-down methods, including SSD Variants
[19], [46], YOLO variants [15], [20], [47] and RetinaNet [22],
are designed to predict the detection results directly from
anchors – without any proposal elimination stage – after
extracting the features from the input image. We present a
typical one-stage object detection pipeline in Figure 1(a). The
pipeline starts with feeding the input image to the feature
extraction network, which is usually a deep convolutional
neural network. A dense set of object hypotheses (called
anchors) are produced, which are then sampled and labeled
by matching them to ground-truth boxes. Finally, labeled
anchors (whose features are obtained from the output of the
feature extraction network) are fed to the classification and
regression networks for training. In a two-stage method,
object proposals (or regions-of-interest) are first generated
using anchors by a separate network (hence, the two stages).
On the other hand, bottom-up object detection methods
[23], [48], [49] first predict important key-points (e.g. cor-
ners, centers, etc.) on objects and then group them to form
whole object instances by using a grouping method such as
associative embedding [50] and brute force search [49].
2.2 Frequently Used Terms and Notation
Table 2 presents the notation used throughout the paper,
and below is a list of frequently used terms.

OKSUZ et al.: IMBALANCE PROBLEMS IN OBJECT DETECTION: A REVIEW 4
BB Matching & Labeling
Anchor/RoI Set
GT Set
Feature Extraction
Detection
BB Matching, Labeling and Sampling
Blue GT
Black GT
Positive
Negative
Sampling
Ground Truth Scales
2- Scale Imbalance (§5)
Loss Values of Tasks
4-Objective Imbalance (§7)
1- Class Imbalance (§4)
Class 1
Class 2
3-Spatial Imbalance (§6)
IoU of Positive Input BB
0.60.5 0.7 0.8 0.9
# of
Examples
Detections & Loss Value
Classification
Regression
BBs
to Train
Labeled
BBs
(a)
(b)
Example Numbers
Black GT
Blue GT
Feature
Extraction
Network
Fig. 1: (a) The common training pipeline of a generic detection network. The pipeline has 3 phases (i.e. feature extraction,
detection and BB matching, labeling and sampling) represented by different background colors. (b) Illustration of an
example imbalance problem from each category for object detection through the training pipeline. Background colors
specify at which phase an imbalance problem occurs.
Feature Extraction Network/Backbone: This is the part of
the object detection pipeline from the input image until the
detection network.
Classification Network/Classifier: This is the part of the
object detection pipeline from the features extracted by the
backbone to the classification result, which is indicated by a
confidence score.
Regression Network/Regressor: This is the part of the
object detection pipeline from the features extracted by the
backbone to the regression output, which is indicated by
two bounding box coordinates each of which consisting of
an x-axis and y-axis values.
Detection Network/Detector: It is the part of the object
detection pipeline including both classifier and regressor.
Region Proposal Network (RPN): It is the part of the two
stage object detection pipeline from the features extracted
by the backbone to the generated proposals, which also have
confidence scores and bounding box coordinates.
Bounding Box: A rectangle on the image limiting certain
features. Formally, [x
1
, y
1
, x
2
, y
2
] determine a bounding box
with top-left corner (x
1
, y
1
) and bottom-right corner (x
2
, y
2
)
satisfying x
2
> x
1
and y
2
> y
1
.
Anchor: The set of pre defined bounding boxes on which the
RPN in two stage object detectors and detection network in
one stage detectors are applied.
Region of Interest (RoI)/Proposal: The set of bounding
boxes generated by a proposal mechanism such as RPN on
which the detection network is applied on two state object
detectors.
Input Bounding Box: Sampled anchor or RoI the detection
network or RPN is trained with.
Ground Truth: It is tuple (B, u) such that B is the bounding
box and u is the class label where u ∈ C and C is the
enumeration of the classes in the dataset.
Detection: It is a tuple (
¯
B, p) such that
¯
B is the bounding
box and p is the vector over the confidence scores for each
class
2
and bounding box.
Intersection Over Union: For a ground truth box B and a
detection box
¯
B, we can formally define Intersection over
Union(IoU) [51], [52], denoted by IoU(B,
¯
B), as
IoU(B,
¯
B) =
A(B ∩
¯
B)
A(B ∪
¯
B)
, (1)
such that A(B) is the area of a bounding box B.
Under-represented Class: The class which has less samples
in a dataset or mini batch during training in the context of
class imbalance.
Over-represented Class: The class which has more samples
in a dataset or mini batch during training in the context of
class imbalance.
Backbone Features: The set of features obtained during the
application of the backbone network.
Pyramidal Features/Feature Pyramid: The set of features
obtained by applying some transformations to the backbone
features.
Regression Objective Input: Some methods make predic-
tions in the log domain by applying some transformation
which can also differ from method to method (compare
transformation in Fast R-CNN [17] and in KL loss [53] for
Smooth L1 Loss), while some methods directly predict the
bounding box coordinates [23]. For the sake of clarity, we
use ˆx to denote the regression loss input for any method.
2. We use class and category interchangeably in this paper.

OKSUZ et al.: IMBALANCE PROBLEMS IN OBJECT DETECTION: A REVIEW 5
Methods for
Imbalance Problems
Spatial Imbalance
(§6)
Imbalance in
Regression Task
(§6.1)
IoU
Distribution
Imbalance
(§6.2)
Object
Location
Imbalance
(§6.3)
Scale Imbalance
(§5)
Object/box-
level Imbalance
(§5.1)
Feature-level
Imbalance
(§5.2)
Objective Imbalance
(§7)
Class Imbalance
(§4)
Fg-Bg Class
Imbalance
(§4.1)
Fg-Fg Class
Imbalance
(§4.2)
• Task Weighting
• Classification Aware
Regression Loss [30]
• Guided Loss [54]
•1.Hard Sampling Methods
• A.Random Sampling
• B.Hard Example Mining
– Bootstraping [55]
– SSD [19]
– Online Hard Example Mining [24]
– IoU-based Sampling [29]
• C.Limit Search Space
– Two-stage Object Detectors
– IoU-lower Bound [17]
– Objectness Prior [56]
– Negative Anchor Filtering [57]
– Objectness Module [58]
•2.Soft Sampling Methods
• Focal Loss [22]
• Gradient Harmonizing Mechanism [59]
• Prime Sample Attention [30]
•3.Sampling-Free Methods
• Residual Objectness [60]
• No Sampling Heuristics [54]
• AP Loss [61]
• DR Loss [62]
•4.Generative Methods
• Adversarial Faster-RCNN [63]
• Task Aware Data Synthesis [64]
• PSIS [65]
• pRoI Generator [66]
• See generative methods for
fg-bg class imb.
• Fine-tuning Long Tail
Distribution for Obj.Det. [25]
• OFB Sampling [66]
• Guided Anchoring [67]
• Free Anchor [68]
•1.Methods Predicting from the Feature Hierarchy of
Backbone Features
• Scale-dependent Pooling [69]
• SSD [19]
• Multi Scale CNN [70]
• Scale Aware Fast R-CNN [71]
•2.Methods Based on Feature Pyramids
• FPN [26]
• See feature-level imbalance methods
•3.Methods Based on Image Pyramids
• SNIP [27]
• SNIPER [28]
•4.Methods Combining Image and Feature Pyramids
• Efficient Featurized Image Pyramids [72]
• Enriched Feature Guided Refinement Network [58]
• Super Resolution for Small Objects [73]
• Scale Aware Trident Network [74]
•1.Methods Using Pyramidal Features as a Basis
• PANet [75]
• Libra FPN [29]
•2.Methods Using Backbone Features as a Basis
• STDN [76]
• Parallel-FPN [77]
• Deep Feature Pyramid Reconf. [78]
• Zoom Out-and-In [79]
• Multi-level FPN [80]
• NAS-FPN [81]
• Auto-FPN [82]
•1.Lp norm based
• Smooth L1 [17]
• Balanced L1 [29]
• KL Loss [53]
• Gradient Harmonizing
Mechanism [59]
•2.IoU based
• IoU Loss [83]
• Bounded IoU Loss [84]
• GIoU Loss [85]
• Distance IoU Loss [86]
• Complete IoU Loss [86]
• Cascade R-CNN [87]
• HSD [88]
• IoU-uniform R-CNN [89]
• pRoI Generator [66]
Fig. 2: Problem based categorization of the methods used for imbalance problems. Note that a work may appear at multiple
locations if it addresses multiple imbalance problems – e.g. Libra R-CNN [29]
3 A TAXONOMY OF THE IMBALANCE PROBLEMS
AND THEIR SOLUTIONS IN OBJECT DETECTION
In Section 1, we defined the problem of imbalance as
the occurrence of a distributional bias regarding an input
property in the object detection training pipeline. Several
different types of such imbalance can be observed at various
stages of the common object detection pipeline (Figure 1). To
study these problems in a systematic manner, we propose a
taxonomy based on the related input property.
We identify eight different imbalance problems, which
we group into four main categories: class imbalance, scale
imbalance, spatial imbalance and objective imbalance. Table
1 presents the complete taxonomy along with a brief defi-
nition for each problem. In Figure 2, we present the same
taxonomy along with a list of proposed solutions for each
problem. Finally, in Figure 1, we illustrate a generic object
detection pipeline where each phase is annotated with their
typically observed imbalance problems. In the following,
we elaborate on the brief definitions provided earlier, and
illustrate the typical phases where each imbalance problem
occurs.
Class imbalance (Section 4; blue branch in Figure 2)
剩余33页未读,继续阅读



安全验证
文档复制为VIP权益,开通VIP直接复制

评论0