深度解析迁移学习：数据与模型视角

需积分: 50 114 浏览量更新于2024-07-16 1 收藏 802KB PDF 举报

"A Comprehensive Survey on Transfer Learning.pdf" 迁移学习（Transfer Learning）是机器学习领域的一个重要研究方向，其核心目标是通过将一个或多个相关但不同的源域中的知识转移到目标域，来提升目标域上学习任务的性能。这种方法减少了对大量目标域数据的依赖，这对于数据获取困难或者数据量有限的情况尤其有价值。随着人工智能和大数据的快速发展，迁移学习因其广泛的应用前景而受到越来越多的关注。本论文对现有的迁移学习方法进行了全面的调查和系统化整理。与以往的综述文章相比，它不仅回顾了四十多种代表性的迁移学习方法，而且是从数据和模型两个角度进行分类和分析的。这种分类方式有助于读者理解迁移学习的机制和策略，并提供了深入的见解。在数据层面，迁移学习可以涉及到特征选择、表示学习和分布匹配等方面。特征选择旨在识别和转移最有价值的特征，减少不相关特征的干扰；表示学习则尝试在源域和目标域之间找到共享的低维表示空间，以降低域之间的差异；分布匹配则关注如何调整源域和目标域的数据分布，使之更加相似。在模型层面，论文涵盖了从传统的统计学习方法到深度学习模型的各种迁移学习策略。这些策略包括实例迁移、特征迁移、参数迁移以及更复杂的方法如多任务学习、元学习和对抗性训练等。这些模型通过不同的方式适应源域和目标域之间的差异，提高泛化能力。为了展示不同迁移学习模型的性能，论文选取了二十个具有代表性的模型，并在三个不同的数据集（Amazon Reviews、Reuters-21578和Office-31）上进行了实验。实验结果表明，针对不同的应用场景，选择合适的迁移学习模型至关重要，这进一步强调了迁移学习在实际应用中的灵活性和有效性。这篇综合调查论文为迁移学习的研究提供了一个全面且深入的视角，对于研究人员和实践者来说，它是一个了解当前研究进展和思想的宝贵资源。通过梳理近年来的最新进展，它推动了对迁移学习理论和实践的理解，促进了该领域的进一步发展。

multi-source transfer learning, which mainly has the follow-

ing two steps in e ach iteration.

1. Candidate Class iﬁer Construction: A group of candi-

date weak classiﬁers are respectively trained on the

weighted instances in the pairs of each source domain

and the target domain, i.e., D

∪ D

(i = 1, · · · , m

2. Instance Weighting: A classiﬁer which has the minimal

classiﬁcation error rate

δ on the target domain instances

is selecte d (denoted by j), and then is used for updating

the weights of the instances in D

and D

Finally, the selected classiﬁers from each iteration are com-

bined to form the ﬁ nal classiﬁer. Another parameter-based

algorithm, i.e., TaskTrAdaBoost, is also proposed in that

work [26], which is introduced in Sect ion 5.3.

Some approaches realize instance weighting stra tegy in a

heuristic way. For example, Jiang and Zhai proposed a gen-

eral weighting framework for the adaptation of instances

[27]. According to the paper, three types of instances (i.e., la-

beled source-domain, labeled target-domain, and unlabeled

target-domain insta nces) are used to construct the target

classiﬁer. There are three terms in the obj e ctive function,

which are designed corresponding to the instances’ types

for minimizing the cross-entropy loss.

• Labeled Target-domain Instance: The classiﬁer should mini-

mize the cross-entropy loss on them, which is actually a

standard supervis e d learning task.

• Unlabeled Target-domain Instance: These instances’ true con-

ditional dist ributions P (y|x

T,U

) are unknown and should

be estimated. A possib le solution is to train an auxiliary

classiﬁer on the labeled source-domain and target-domain

instances to help est imate the conditional distributions or

assign pseudo labels to these instances.

• Labeled Source-domain Instance: The authors deﬁne the

weight of x

S,L

as the product of two parts, i.e., α

and

. T he weight β

is ideally equal to P

)/P

which can be estimated by non-parametric methods such

as KMM or can be set uniformly in t he worst ca se. The

weight α

is us e d to ﬁlter ou t the source-domain instances

that differ greatly from the target domain.

A heuristic method can be used to produce the value of α

which contains the following three steps.

1. Auxiliary Classiﬁer Construction: An auxiliary classiﬁer

trained on the labeled target-domain instances are used

to classify the unlabeled source-domain ins tances.

2. Instance Ranking: The source-domain instances are

ranked based on the probabilistic prediction results.

3. Heuristic Weighting (β

): The weights of the top-k

source-domain instances with wrong predictions are set

to zero, and the weights of others are set to one.

The objective function of this framework consists of four

terms, i.e., the above-mentioned three terms with three

tradeoff parameters controlling t he balance among the types

of instances and a regularizer controlling the complexity of

the model.

4.2 Feature Transformation Strategy

Feature transformation strategy is often adopted in feature-

based approaches. Feature-based approaches transform

each original feature into a new feat ure representat ion for

TABLE 2

Metrics Adopted in Transfer Learning.

Measurement Related Algorithms

Maximum Mean Discrepancy [28] [29] [30 ] [31] [32 ]· · ·

Kullback-Leibler Divergence [33] [34] [35 ] [36] [37 ]· · ·

Jensen-Shannon Divergence [38] [39] [40 ] [41] [42 ]· · ·

Bregman Divergence [43] [44] [45 ] [46] [47 ]· · ·

Hilbert-Schmidt Independ e nce Criterion [48] [29] [49] [50] [51]· · ·

transfer learning. The objectives of constructing a new fea-

ture representation include minimizing the marginal and the

conditional distribution difference, preserving the proper-

ties or the potential structures of the data, and ﬁnding the

correspondence between features. The operations of feature

transformation can be divided into three types, i.e., feature

augmentation, feature reduction, and feature alignment. Be-

sides, feature reduction can be further divided into several

types such as feature mapping, feature clustering, feature

selection, and feature encoding. A complete feature trans-

formation process designed in an a lgorithm may consist of

several operations.

4.2.1 Distribution Difference Metric

One primary objective of feature transformation is to reduce

the distribution difference of the source and the target do-

main instances. Therefore, how to measure the distribution

difference or the s imilarity between domains effectively is

an important issue.

The measurement termed Maximum Mean Discrepancy

(MMD) is widely used in the ﬁeld of transfer learning,

which is formulated as follows [28]:

MMD(X

, X

) =



i=1

Φ(x

) −

j=1

Φ(x

)



MMD can be easily comp uted by using kernel trick. Brieﬂy,

MMD quantiﬁes the distribution difference by calcuating

the distance of the mean valu e s of the instances in a RKHS.

Note that the abov e -mentioned KMM actually produces

the weights of instances b y minimizing the MMD distance

between domains.

Table. 2 lists some commonly used metrics and the

related algorithms. In addition to Table. 2, there a re s ome

other measurement criteria adopted in transfer learning,

including Wasserstein distance [52], [53], Central Moment

Discrepancy [54], etc. Some studies focus on optimizing

and improving the existing me asurements. Take MMD as

an example. Gretton et al. proposed a multi-kernel version

of MMD, i.e., MK-MMD [55], which takes advantage of

multiple kernels. Besides, Yan et al. proposed a weighted

version of MMD [56], which attempts to address the issue

of class weight bias.

4.2.2 Feature Augmentation

Feature au gm e ntation ope rations are widely used in fea-

ture transformation, especially in symmetric feature-based

approaches. To be more speciﬁc, there are several ways to

realize feature augmentation such as feature replication and

剩余26页未读，继续阅读

BaolanChen

粉丝: 4

深度解析迁移学习：数据与模型视角

迁移学习入门级综述文章：A Survey on Transfer Learning

最新迁移学习综述论文（A Comprehensive Survey on Transfer Learning）- 中科院.zip

迁移学习综述a survey on transfer learning的整理下载

a comprehensive survey on transfer learning

(2021PIEEE) TransferLearning_关于迁移学习的最新和最权威综述_

西安立辰远景JAVA笔试题目-Transfer-Learning-Toolkit:初级研究人员迁移学习工具包

A Survey on Transfer Learnin Sinno Jialin Pan and Qiang Yang pdf

Metric Learning A Survey

online learning A comprehensive Survey.pdf

cole_02_0507.pdf

最新资源