2019年面向领域适应的鲁棒对象检测学习方法

下载需积分: 0 | PDF格式 | 5.12MB | 更新于2024-08-25 | 171 浏览量 | 举报

"2019年的论文《A Robust Learning Approach to Domain Adaptive Object Detection》主要探讨了在现实世界应用中无法避免的领域转移问题，特别是在目标检测领域，如自动驾驶汽车和监控系统中。论文关注的是如何处理在实际环境中遇到的数据分布差异（即领域适应），这通常会导致训练数据不足或不适用，例如道路环境的多样性与预训练数据可能存在的限制，或者隐私法规导致的代表性不足。论文的核心观点是将领域适应问题视为一个鲁棒学习的问题，通过将它转化为带有噪声的标签学习。作者提出了一种针对对象检测任务的鲁棒框架，该框架能够有效地应对类别标签、边界框位置和大小标注中的噪声。这种鲁棒性设计对于确保模型在面对不同领域的未知情况时仍能保持性能至关重要。具体方法是，该框架能够在目标领域上进行训练，利用一套由检测算法生成的带噪声的对象边界框数据。通过这种方法，模型不仅学会了识别目标对象，还能逐渐适应新领域的特征，从而减少由于领域转移带来的性能下降。同时，该研究还可能涉及对噪声的抑制策略，比如采用适当的正则化技术或集成学习方法，来减少噪声对模型性能的影响。这篇论文为解决领域适应问题提供了一个新颖的角度，即通过鲁棒学习技术来增强模型的泛化能力，这对于提高自动驾驶车辆的安全性和监控系统的实用性具有重要意义。通过实践证明，这种方法对于实际应用中的动态环境和隐私保护挑战有着显著的优势。"

展开

main (S) and the test data space as the target domain (T ).

We assume that an annotated training image dataset in S

is supplied, but that only images in T are given (i.e. there

are no labels in T ). Our framework, visualized in Fig. 1,

consists of three main phases:

1. Object proposal mining: A standard Faster R-CNN,

trained on the source domain, is used to detect objects

in the target domain. The detected objects form a pro-

posal set in T .

2. Image classiﬁcation training: Given the images ex-

tracted from bounding boxes in S, we train an image

classiﬁcation model that predicts the class of objects

in each image. The resulting classiﬁer is used to score

the proposed bounding boxes in T . This model aids in

training the robust object detection model in the next

phase. The reason for introducing image classiﬁcation

is that i) this model may rely on representations differ-

ent than those used by the phase one detection model

(e.g., motion features) or it may use a more sophisti-

cated network architectures, and ii) this model can be

trained in a semi-supervised fashion using labeled im-

ages in S and unlabeled images in T .

3. Robust object detection training: In this phase a

robust object detection model is trained using object

bounding boxes in S and object proposals in T (from

phase one) that has been rescored using the image clas-

siﬁcation (from phase two).

We organize the detailed method description as follows.

Firstly, we introduce background notation and provide a de-

scription of Faster R-CNN in Sec. 3.1 to deﬁne the model

used in phase one. Secondly, a probabilistic view of Faster

R-CNN in Sec. 3.2 provides a foundation for the robust ob-

ject detection framework presented in Sec. 3.3. This deﬁnes

the model used in phase three. Lastly, the image classiﬁca-

tion model used in phase two is discussed in Sec. 3.4.

Notation: We are given training images in S along with

their object bounding box labels. This training set is de-

noted by D

= {(x

(s)

)} where x

(s)

∈ S represents

an image, y

(s)

is the corresponding bounding box label for

(s)

and s is an index. Each bounding box y

y = (y

) rep-

resents a class label by an integer, y

∈ Y = {1, 2, . . . , C},

where C is the number of foreground classes, and a 4-tuple,

∈ R

, giving the coordinates of the top left corner,

height, and width of the box. To simplify notation, we as-

sociate each image with a single bounding box.

In the target domain, images are given without bounding

box annotations. At the end of phase one, we augment this

dataset with proposed bounding boxes generated by Faster

R-CNN. We denote the resulting set by D

= {x

(t)

}

This restriction is for notational convenience only. Our implementa-

tion makes no assumptions about the number of objects in each image.

where x

(t)

∈ T is an image,

(t)

∈ Y is the corresponding

proposed bounding box and t is an index. Finally, we obtain

the image classiﬁcation score obtained at the end of phase

two for each instance in D

from p

img

) which rep-

resents the probability of assigning the image cropped in the

bounding box

in x

x to the class y

∈ Y ∪ {0} which is one

of the foreground categories or background.

3.1. Faster R-CNN

Faster R-CNN [45] is a two-stage detector consisting of

two main components: a region proposal network (RPN)

that proposes regions of interests (ROI) for object detection

and an ROI classiﬁer that predicts object labels for the pro-

posed bounding boxes. These two components share the

ﬁrst convolutional layers. Given an input image, the shared

layers extract a feature map for the image. In the ﬁrst stage,

RPN predicts the probability of a set of predeﬁned anchor

boxes for being an object or background along with reﬁne-

ments in their sizes and locations. The anchor boxes are a

ﬁxed predeﬁned set of boxes with varying positions, sizes

and aspect ratios across the image. Similar to RPN, the re-

gion classiﬁer predicts object labels for ROIs proposed by

the RPN as well as reﬁnements for the location and size

of the boxes. Features passed to the classiﬁer are obtained

with a ROI-pooling layer. Both networks are trained jointly

by minimizing a loss function:

L = L

RP N

+ L

ROI

. (1)

RP N

and L

ROI

represent losses used for the RPN and

ROI classiﬁer. The losses consist of a cross-entropy cost

measuring the mis-classiﬁcation error and a regression loss

quantifying the localization error. The RPN is trained to

detect and localize objects without regard to their classes,

and the ROI classiﬁcation network is trained to classify the

object labels.

3.2. A Probabilistic View of Faster R-CNN

In this section, we provide a probabilistic view of Faster

R-CNN that will be used to deﬁne a robust loss function for

noisy detection labels. The ROI classiﬁer in Faster R-CNN

generates an object classiﬁcation score and object location

for each proposed bounding box generated by the RPN. A

classiﬁcation prediction p

cls

) represents the prob-

ability of a categorical random variable taking one of the

disjoint C + 1 classes (i.e., foreground classes plus back-

ground). This classiﬁcation distribution is modeled using a

softmax activation. Similarly, we model the location pre-

diction p

loc

) = N (y

;

, σI

I) with a multivariate

Normal distribution

with mean

and constant diagonal

covariance matrix σI

I. In practice, only

is generated by

the ROI classiﬁer which is used to localize the object.

This assumption follows naturally if the L

-norm is used for the lo-

calization error in Eq. 1. In practice however, a combination of L

and L

norms are used which do not correspond to a simple probabilistic output.

下载后可阅读完整内容，剩余10页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

油光发亮的小猛

粉丝: 5

2019年面向领域适应的鲁棒对象检测学习方法

A Robust Minimax Approach to Classification

Robust and Optimal Control_5.pdf

a robust optimization approach to asset-la under time-varying inv opp.pdf

Robust Portfolio Optimization using CVaR.PDF

Robust and Optimal control_simple.pdf

robust mean-variance portfolio selection.pdf

robust optimization_sliders_rt.pdf

Robust Power System Frequency Control Springer .pdf

Robust_Adaptive_Beamforming.pdf

EECS-2018-120Efficient Policy Learning for Robust Robot Grasping.pdf

最新资源