无监督深度域适应：反向传播驱动的特征转换

需积分: 0 142 浏览量更新于2024-09-01 收藏 3.18MB PDF 举报

本文档深入探讨了"第二十四篇——Unsupervised Domain Adaptation by Backpropagation"，这是一个关于无监督领域适应的关键研究论文。作者Yaroslav Ganin和Victor Lempitsky来自Skolkovo Institute of Science and Technology (SklTech)，他们针对深度学习中的一个重要挑战提出了创新方法。传统上，深度架构依赖于大规模标注数据进行训练，但在某些任务缺乏标记数据的情况下，领域适应显得尤为重要，特别是在源领域和目标领域存在相似性质但数据来源不同的情况下，比如利用合成图像作为辅助。论文的核心贡献是提出了一种新的无监督领域适应方法，这种方法允许模型同时处理大量的源领域标注数据和目标领域的未标注数据。这种方法强调了深度特征的学习，这些特征不仅能够有效区分源领域的任务，还表现出跨域不变性，即在源域和目标域之间保持一致性。这在很大程度上拓展了深度学习的应用范围，因为即使没有目标领域的标注信息，模型也能通过反向传播进行有效的训练。论文中的关键创新在于引入了一个简单的梯度反转层，它巧妙地调整了梯度的方向，使得模型在学习源任务的同时，逐渐抵消源域与目标域之间的差异。这一技术使得任何前馈模型，无论基础架构如何，都能够通过标准的反向传播进行无监督领域适应的训练。总结来说，这篇论文为深度学习中的无监督领域适应提供了一个实用且可扩展的框架，它突破了对大量标注数据的依赖，为实际应用中数据匮乏的问题提供了新的解决方案。这对于计算机视觉、自然语言处理等众多领域都有着深远的影响，特别是在迁移学习和跨模态任务中，该方法无疑将推动领域适应技术的发展。

Unsupervised Domain Adaptation by Backpropagation

Figure 1. The proposed architecture includes a deep feature extractor (green) and a deep label predictor (blue), which together form

a standard feed-forward architecture. Unsupervised domain adaptation is achieved by adding a domain classiﬁer (red) connected to the

feature extractor via a gradient reversal layer that multiplies the gradient by a certain negative constant during the backpropagation-

based training. Otherwise, the training proceeds in a standard way and minimizes the label prediction loss (for source examples) and

the domain classiﬁcation loss (for all samples). Gradient reversal ensures that the feature distributions over the two domains are made

similar (as indistinguishable as possible for the domain classiﬁer), thus resulting in the domain-invariant features.

forward models can handle. We further assume that there

exist two distributions S(x, y) and T (x, y) on X ⊗ Y ,

which will be referred to as the source distribution and

the target distribution (or the source domain and the tar-

get domain). Both distributions are assumed complex and

unknown, and furthermore similar but different (in other

words, S is “shifted” from T by some domain shift).

Our ultimate goal is to be able to predict labels y given

the input x for the target distribution. At training time,

we have an access to a large set of training samples

, x

, . . . , x

} from both the source and the target do-

mains distributed according to the marginal distributions

S(x) and T (x). We denote with d

the binary variable (do-

main label) for the i-th example, which indicates whether

come from the source distribution (x

∼S(x) if d

=0) or

from the target distribution (x

∼T (x) if d

=1). For the ex-

amples from the source distribution (d

=0) the correspond-

ing labels y

∈ Y are known at training time. For the ex-

amples from the target domains, we do not know the labels

at training time, and we want to predict such labels at test

time.

We now deﬁne a deep feed-forward architecture that for

each input x predicts its label y ∈ Y and its domain label

d ∈ {0, 1}. We decompose such mapping into three parts.

We assume that the input x is ﬁrst mapped by a mapping

(a feature extractor) to a D-dimensional feature vector

f ∈ R

. The feature mapping may also include several

feed-forward layers and we denote the vector of parame-

ters of all layers in this mapping as θ

, i.e. f = G

(x; θ

Then, the feature vector f is mapped by a mapping G

(la-

bel predictor) to the label y, and we denote the parameters

of this mapping with θ

. Finally, the same feature vector f

is mapped to the domain label d by a mapping G

(domain

classiﬁer) with the parameters θ

(Figure 1).

During the learning stage, we aim to minimize the label

prediction loss on the annotated part (i.e. the source part)

of the training set, and the parameters of both the feature

extractor and the label predictor are thus optimized in or-

der to minimize the empirical loss for the source domain

samples. This ensures the discriminativeness of the fea-

tures f and the overall good prediction performance of the

combination of the feature extractor and the label predictor

on the source domain.

At the same time, we want to make the features f

domain-invariant. That is, we want to make the dis-

tributions S(f ) = {G

(x; θ

) | x∼S(x)} and T (f ) =

(x; θ

) | x∼T (x)} to be similar. Under the covariate

shift assumption, this would make the label prediction ac-

curacy on the target domain to be the same as on the source

domain (Shimodaira, 2000). Measuring the dissimilarity

of the distributions S(f ) and T (f ) is however non-trivial,

given that f is high-dimensional, and that the distributions

themselves are constantly changing as learning progresses.

One way to estimate the dissimilarity is to look at the loss

of the domain classiﬁer G

, provided that the parameters

of the domain classiﬁer have been trained to discrim-

inate between the two feature distributions in an optimal

way.

This observation leads to our idea. At training time, in or-

der to obtain domain-invariant features, we seek the param-

eters θ

of the feature mapping that maximize the loss of

the domain classiﬁer (by making the two feature distribu-

tions as similar as possible), while simultaneously seeking

the parameters θ

of the domain classiﬁer that minimize the

loss of the domain classiﬁer. In addition, we seek to mini-

mize the loss of the label predictor.

剩余10页未读，继续阅读

疯狂java杰尼龟

粉丝: 4w+
资源: 13

无监督深度域适应：反向传播驱动的特征转换

Unsupervised Domain Adaption of Object Detectors A Survey.pdf

Deep Transfer Network: Unsupervised Domain Adaptation

Unsupervised Domain Adaptation by Backpropagation.pdf

Unsupervised Multi-Source Domain Adaptation Without Access.pdf

Joint Feature and Labeling Function Adaptation for Unsupervised Domain Adaptation.pdf

几何感知的无监督域自适应_Geometry-Aware Unsupervised Domain Adaptation.pdf

NeuroMorph Unsupervised Shape Interpolation and Corresponden.pdf

VDSM Unsupervised Video Disentanglement With State.pdf

UnsupervisedR&R Unsupervised Point Cloud Registration via.pdf

unsupervised domain adaptation

最新资源