目标检测不平衡问题深度探讨：综述与未来挑战

Imbalance

Proble

需积分: 0 35 浏览量更新于2023-05-14 1 收藏 6.45MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

"这篇综述论文深入探讨了目标检测中的不平衡问题，由Kemal Oksuz等人撰写，发表在TPAMI 2020。文章建立了一个基于问题的分类体系，详细分析了各种不平衡问题，并对现有文献中的解决方案给出了统一而批判性的视角。此外，作者还指出了当前不平衡问题的主要开放性问题以及尚未讨论过的不平衡问题。为了保持对最新研究的跟踪，他们创建了一个网页，根据问题分类整理了相关论文，网址为：https://github.com/kemaloksuz/ObjectDetectionImbalance。" 目标检测是计算机视觉领域的一个基本问题，旨在识别图像中对象的类别和位置。然而，不平衡问题在目标检测中是一个显著挑战，这通常指的是正负样本比例失衡，导致模型训练时容易偏向数量更多的类或样本。例如，在许多实际场景中，背景像素远多于目标对象像素，这被称为类别不平衡。此外，数据集中不同类别的对象数量可能差异巨大，导致训练过程中某些类别的检测性能下降，这种现象称为类别频率不平衡。论文中引入的基于问题的分类体系将不平衡问题分为几个子类别，包括： 1. **类别不平衡**：训练样本中不同类别的实例数量不均衡，可能导致某些类别的模型性能降低。 2. **实例级不平衡**：即使在同一类别内，不同对象实例的数量也可能存在差异，这可能影响模型对稀有实例的识别能力。 3. **空间不平衡**：目标对象在图像中的分布不均匀，例如，大部分集中在图像的一角，其余部分则为空白或背景，这会使得模型难以学习到全局上下文信息。 4. **尺度不平衡**：对象大小变化范围大，小目标检测通常比大目标更困难，因为小目标提供的特征信息较少。 5. **数据集不平衡**：训练和验证集之间的分布不一致，可能导致模型在未见过的数据上表现不佳。针对这些不平衡问题，文献中提出了一系列解决方案，包括但不限于： - **采样策略**：如重采样（oversampling）、欠采样（undersampling）、分层采样（stratified sampling）等，旨在平衡正负样本的比例。 - **损失函数调整**：如加权交叉熵损失、Focal Loss、GHM（Generalized Harmonized Mean）等，通过调整损失权重来强调难例。 - **数据增强**：增加样本多样性，减少过拟合，提高模型泛化能力。 - **网络架构改进**：设计更适应不平衡数据的网络结构，如多分支网络、注意力机制等。 - **集成方法**：通过组合多个模型的预测结果，提高整体性能。作者指出，尽管已经有许多方法试图解决不平衡问题，但仍存在许多挑战，例如如何有效评估不平衡问题的严重程度，如何选择最佳的采样策略或损失函数，以及如何处理特定场景下的特定不平衡问题。此外，对于新出现的不平衡问题，如动态环境中的实时目标检测，还需要进一步研究。这篇综述论文为理解目标检测中的不平衡问题提供了宝贵的指导，有助于研究人员设计更有效的解决方案，以提高目标检测算法在实际应用中的性能。通过访问提供的网页，可以跟踪最新的相关研究进展，推动这一领域的持续发展。

资源详情

资源推荐

OKSUZ et al.: IMBALANCE PROBLEMS IN OBJECT DETECTION: A REVIEW 7

Number of Anchors for Bg and Fg

166827.50

163.04

Background Foreground

Classes

0.5

1.5

Number of Anchors

(a)

Number of Anchors for Fg Classes

0 10 20 30 40 50 60 70 80

Foreground Classes

Number of Anchors

(b)

Fig. 4: Illustration of the class imbalance problems. The numbers of RetinaNet [22] anchors on MS-COCO [90] are plotted

for foreground-background (a), and foreground classes (b). The values are normalized with the total number of images in

the dataset. The ﬁgures depict severe imbalance towards some classes.

Solutions. We can group the solutions for the foreground-

background class imbalance into four: (i) hard sampling

methods, (ii) soft sampling methods, (iii) sampling-free

methods and (iv) generative methods. Each set of methods

are explained in detail in the subsections below.

In sampling methods, the contribution (w

) of a bound-

ing box (BB

) to the loss function is adjusted as follows:

CE(p

), (2)

where CE() is the cross-entropy loss. Hard and soft sam-

pling approaches differ on the possible values of w

. For

the hard sampling approaches, w

∈ {0, 1}, thus a BB is

either selected or discarded. For soft sampling approaches,

∈ [0, 1], i.e. the contribution of a sample is adjusted with

a weight and each BB is somehow included in training.

4.1.1 Hard Sampling Methods

Hard sampling is a commonly-used method for addressing

imbalance in object detection. It restricts w

to be binary; i.e.,

0 or 1. In other words, it addresses imbalance by selecting

a subset of positive and negative examples (with desired

quantities) from a given set of labeled BBs. This selection

is performed using heuristic methods and the non-selected

examples are ignored for the current iteration. Therefore,

each sampled example contributes equally to the loss (i.e.

= 1) and the non-selected examples (w

= 0) have no

contribution to the training for the current iteration. See

Table 3 for a summary of the main approaches.

A straightforward hard-sampling method is random

sampling. Despite its simplicity, it is employed in R-CNN

family of detectors [16], [21] where, for training RPN, 128

positive examples are sampled uniformly at random (out

of all positive examples) and 128 negative anchors are

sampled in a similar fashion; and 16 positive examples and

48 negative RoIs are sampled uniformly from each image

in the batch at random from within their respective sets,

for training the detection network [17]. In any case, if the

number of positive input bounding boxes is less than the

desired values, the mini-batch is padded with randomly

sampled negatives. On the other hand, it has been reported

that other sampling strategies may perform better when a

property of an input box such as its loss value or IoU is

taken into account [24], [29], [30].

The ﬁrst set of approaches to consider a property of

the sampled examples, rather than random sampling, is

the Hard-example mining methods

. These methods rely

on the hypothesis that training a detector more with hard

examples (i.e. examples with high losses) leads to better

performance. The origins of this hypothesis go back to the

bootstrapping idea in the early works on face detection [55],

[94], [95], human detection [96] and object detection [13].

The idea is based on training an initial model using a subset

of negative examples, then using the negative examples on

which the classiﬁer fails (i.e. hard examples), a new classiﬁer

is trained. Multiple classiﬁers are obtained by applying the

same procedure iteratively. Currently deep-learning-based

methods also adopt some versions of the hard example min-

ing in order to provide more useful examples by using the

loss values of the examples. The ﬁrst deep object detector to

use hard examples in the training was Single-Shot Detector

[19], which chooses only the negative examples incurring

the highest loss values. A more systematic approach con-

sidering the loss values of positive and negative samples

is proposed in Online Hard Example Mining (OHEM) [24].

However, OHEM needs additional memory and causes the

training speed to decrease. Considering the efﬁciency and

memory problems of OHEM, IoU-based sampling [29] was

proposed to associate the hardness of the examples with

their IoUs and to use a sampling method again for only

negative examples rather than computing the loss function

for the entire set. In the IoU-based sampling, the IoU interval

for the negative samples is divided into K bins and equal

number of negative examples are sampled randomly within

each bin to promote the samples with higher IoUs, which

are expected to have higher loss values.

To improve mining performance, several studies pro-

posed to limit the search space in order to make hard

examples easy to mine. Two-stage object detectors [18], [21]

are among these methods since they aim to ﬁnd the most

probable bounding boxes (i.e. RoIs) given anchors, and then

choose top N RoIs with the highest objectness scores, to

which an additional sampling method is applied. Fast R-

CNN [17] sets the lower bound of IoU of the negative RoIs

4. In this paper, we adopt the boldface font whenever we introduce

an approach involving a set of different methods, and the method

names themselves are in italic.

剩余33页未读，继续阅读

syp_net

粉丝: 158
资源: 1196

会员权益专享

目标检测不平衡问题深度探讨：综述与未来挑战

CVPR2020-目标检测合集.7z

深度学习时代的目标检测综述

CVPR2018目标检测论文

10.1109/tpami.2023.3235415

10.1109/tpami.2023.3299568

https://github.com/GeWu-Lab/CSOL_TPAMI2021运行步骤

https://github.com/GeWu-Lab/CSOL_TPAMI2021代码怎么完成运行任务

https://github.com/GeWu-Lab/CSOL_TPAMI2021怎么完成复现工作

IEEE TPAMI 2022年的视觉期刊

调研机器视觉的应用 ，包括国内外现状，发展趋势等，将必要参考文献辅到后面

IEEE TPAMI 用于缺陷检测的期刊

请按国标的方式写出“Improved Techniques for Training Single-Image GANs”这篇论文的引用格式

子空间分割相关的最新英文文献有哪些，请给出年限与DOI号

有三年内比较著名的文献吗

TPAMI属于什么级别？

宏像素图像(MacPI)

ssim loss 语义分割

segmentation probability maps

深度学习sci三区期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence官网

会员权益专享

最新资源

调研机器视觉的应用，包括国内外现状，发展趋势等，将必要参考文献辅到后面