深度学习：理解与应用——领域适应与迁移学习入门

需积分: 10 43 浏览量更新于2024-07-16 收藏 700KB PDF 举报

在"An introduction to domain adaptation and transfer learning.pdf"这篇技术报告中，作者Wouter M. Kouw和Marco Loog探讨了在深度学习和机器学习背景下，当训练数据与潜在分布存在偏差时，如何处理数据分布变化的问题。通常，如果训练数据代表了底层数据分布，学习到的分类函数能在新样本上做出准确预测。然而，现实情况中，训练数据与测试数据之间的分布差异可能导致标准分类器在测试阶段表现不佳。报告的核心内容围绕“域适应”和“迁移学习”这两个机器学习子领域展开。域适应关注的是在源域（训练数据的来源）和目标域（应用或测试数据的环境）之间，如何使分类器能够适应数据分布的变化，从而提高泛化能力。迁移学习则是在源域的知识或模型中寻找有价值的信息，以便在没有直接相关训练数据的目标任务中应用。报告首先介绍了风险最小化的基本概念，这是理解转移学习和域适应理论框架的关键。风险最小化目标是通过优化模型参数来最小化预测错误的概率，这对于确保模型的稳定性和有效性至关重要。接下来，作者深入讨论了如何通过迁移学习扩大风险最小化的应用范围，例如，通过特征选择、权重调整或元学习等策略，将源域的模型或知识迁移到目标任务中，以减少对目标域大量标注数据的需求。然后，报告会探讨不同类型的域适应方法，如自适应学习、半监督学习、无监督学习以及合成域方法，这些方法针对不同的场景和数据条件，提供了灵活的解决方案。此外，报告还可能涉及一些经典案例研究，展示在图像识别、自然语言处理等实际问题中，如何有效地进行域适应和迁移学习。最后，作者总结了当前领域的挑战与前景，如开放世界假设、跨领域性能评估标准的制定，以及未来可能的研究方向，比如集成多源知识、动态适应和对抗性域适应等。这份报告为读者提供了一个全面的入门指南，帮助理解如何在面临数据分布不一致时，通过转移学习和域适应技术提升机器学习模型的泛化能力和实用性。

8 W.M. Kouw, M. Loog

Generalization Ultimately, we are not interested in the error of the trained

classiﬁer on the given data set, but in the error on all possible future sa mples

e(h) = E

X ,Y

[h(x) 6= y]. The diﬀere nc e between the true error and the empirical

error is known as the generalization error: e(h)− ˆe(h) [11,151]. Ideally, we would

like to know if the gener alization error will be small, i.e., that our classiﬁer will

be approximately correct. However, because classiﬁers are functions of data sets,

and data sets are random, we can o nly de scribe how probable it is that our

classiﬁer will be approximately correct. We can say that, with probability 1 − δ,

where δ > 0, the following inequality bolds (Theorem 2.2 from [151]):

e(h) − ˆe(h) ≤



log |H | + log



. (2)

where |H| denotes the cardinality of the ﬁnite hypothesis spac e, or the number

of classiﬁcation functions that are being considered [193,119,151]. This result

is known as a Probably Approximately Corre c t (PAC) bound. In words, the

diﬀerence between the true error, e(h), and the empirical error, ˆe(h), of a classiﬁer

is less than the square root of the logarithm of the size of the hypothesis space

|H|, plus the log of 2 over δ, normalized by twice the sample size n. In order to

achieve a similar result fo r the case of an inﬁnite hypothesis spa c e (e.g. line ar

classiﬁers ), a measure of the c omplexity of the hypothesis space is r equired.

Generalization error bounds are interesting because they analyze what a clas-

siﬁer’s performance depends on. In this case, it suggest choo sing a smaller or

simpler hypothesis space when the sample size is low. Many variants of bounds

exist. Some use diﬀerent measures of complexity, such as Rademacher complex-

ity [14] or Vapnik-Chervonenkis dimensions [20,197], while others use concepts

from Bayesian inference [147,131,15].

Bounds can incorporate assumptions on the pr oblem setting [12,151,54]. For

example, one can assume that the posterior distributions in each domain are

equal a nd obtain a bound for a classiﬁer that exploits that assumption (c.f.

Equation 6). Assumptions restrict the problem setting, i.e., settings where that

assumption is invalid are disre garded. This often means that the bound is tighter

and a more accurate desc ription of the behaviour of the classiﬁer can be found.

Such results have inspired new algor ithms in the past, such as Adaboost or the

Support Vector Machine [69,47].

Regularization Generalization error bounds tell us that the complexity, or

ﬂexibility, of a c lassiﬁer has to be traded oﬀ with the number of available train-

ing s amples [61,197,54]. In particular, a ﬂexible model c an minimize the error

on a given data set completely, but will be too speciﬁc to generalize to new

samples. This is known as overﬁtting. Figure 2 (left) illustrates an example of

2-dimensional classiﬁcation problem with a classiﬁer that ha s perfectly ﬁtted to

the training set. As can be imagined, it will not perform as well for new s amples.

In order to combat overﬁtting, an additional term is introduced in the empiric al

An introduction to d omain adaptation and transfer learning 9

risk estimator that punishes model ﬂexibility. This regularization term is often

a simple additive term in the form of the norm of the classiﬁer’s parameters

[188,25]. Figur e 2 (middle) visualizes a n example of a properly regularized cla s-

siﬁer, that will probably generalize well to new samples. Figure 2 (right) shows

an example o f a too heavily regularized classiﬁer, also known as an ”under ﬁtted”

classiﬁer.

-3 -2 -1 0 1 2 3

-3

-2

-1

-3 -2 -1 0 1 2 3

-3

-2

-1

-3 -2 -1 0 1 2 3

-3

-2

-1

Fig. 2. Examples of classiﬁer complexities. (Left) Overﬁtted classiﬁer, (middle) well-

ﬁtted classiﬁer, (right) u nderﬁtted classiﬁer.

3 Domain adaptation and transfer learning

We deﬁne domains as the combination of an input spa ce X , an output space

Y and an associated probability distribution p. Inputs are subsets of the D-

dimensional real space R

and are sometimes referred to as fea ture vectors, or

points in feature space. Outputs are classes, which can be binary, in which case

Y corresponds to {−1, +1}, or multi-class, in which cas e Y = {1, . . . K} . Given

two domains, we call them diﬀerent if they are diﬀerent in at leas t one of their

constituent components, i.e., the input space, the output space, or the probabil-

ity density function. Transfer learning is deﬁned as the g eneral case where the

domains are freely allowed to diﬀer in sample space, label space, distribution

or all. For example, image caption generators from computer vision generalize

from the ”image domain” to the ”text domain”, which would be an example of

diﬀerences between feature spaces [117,88]. Domain adaptation is de ﬁned as the

particular case where the sample and label spaces remain unchanged and only

the probability distr ibutio ns change.

3.1 Notation

We denote the source domain as (X , Y, p

) and will sometimes refer to it in

shorthand as S. The target domain is denoted (X , Y, p

) with the shorthand

T . Domain-speciﬁc functions will b e marked with the subscript S or T . For

example, p

(x, y) for the target joint distribution, p

(x) for the target data

marginal distribution and p

(x | y) for the target class-conditional distribution.

剩余41页未读，继续阅读

dywlegend1002

粉丝: 1

深度学习：理解与应用——领域适应与迁移学习入门

"管理学原理资料整理：中英文版笔记，有效协调工作活动的艺术

CVPR2018 Oral论文深度解析：人工智能与机器学习的前沿探索

利用Domain Adaptation提升时间序列预测效果

深度学习神经网络(英文版PDF教程）

2020年机器学习深度学习下载地址.txt

YOLOv8 Model Fusion and Transfer Learning: Analysis of Cross-Domain Task Transfer Strategies

【Transfer Learning】: GAN Transfer Learning Applications: Bridging Domain Boundaries to Accelerate ...

The Application of Transfer Learning in Model Construction: 3 Case Studies to Get You Started

Advanced Topics in MATLAB Control System Design: Adaptive and Learning Control

: Cracking the Convergence Dilemma of GANs: In-Depth Analysis from Theory to Practice

最新资源