元学习综述：数据驱动的快速学习策略

需积分: 14 189 浏览量更新于2024-07-15 收藏 387KB PDF 举报

元学习（Meta-Learning），也称为学习去学习，是机器学习领域的一个前沿概念，它关注的是如何系统地观察和分析不同机器学习方法在各种任务上的表现，然后利用这些经验和元数据来加速学习新任务的能力。这种技术不仅显著提高了机器学习流程的设计效率，优化神经网络架构，而且使得我们可以用数据驱动的方式替代传统的人工设计算法。在元学习的研究中，论文《arXiv:1810.03548v1[cs.LG]8Oct2018》中的作者Joaquin Vanschoren提供了深入的综述。他指出，当我们学习新的技能时，通常不会完全从零开始，而是会借鉴先前在相关任务中学到的知识，重复之前有效的策略，并根据经验专注于可能有成效的方法（如Lake等人，2017年）。随着每个新技能的积累，学习新技能变得更为容易，需要的是一种“迁移学习”或“适应性学习”的能力。元学习的关键在于它能够从大量任务的泛化经验中提炼出通用的学习策略，这在诸如深度学习、强化学习等场景下尤为突出。它通过训练一个模型（被称为元模型或超模型）来学习如何快速调整和优化其他特定任务的模型，这样即使在面对未曾见过的任务时，也能更快地找到有效的解决方案。具体方法包括但不限于：元训练（Meta-training），即在一系列任务上训练一个模型，使其能够学习如何学习；元测试（Meta-testing），评估在新的任务上模型的泛化能力；以及元学习算法，如MAML（Model-Agnostic Meta-Learning）、Reptile等，它们能自动更新模型参数以适应新任务。此外，元学习还有助于解决小样本学习问题，当数据稀缺或者特定任务的数据难以获取时，通过利用已有的多任务数据，元学习可以提供更好的泛化性能。同时，它也为自动化机器学习（AutoML）的发展提供了强有力的工具，允许我们构建更智能、自适应的机器学习流程，减少人工干预的需求。元学习作为一门动态发展的科学，正在不断推动着机器学习技术的进步，使得我们在设计和实施机器学习项目时能够更加高效且灵活。未来的研究将继续探索元学习的不同方法、理论基础和实际应用，以实现更广泛领域的智能化和自动化。

Joaquin Vanschoren

2.4 Learning Curves

We can also extract meta-data about the training process itself, such as how fast model

performance improves as more training data is added. If we divide the training in steps

, usually adding a ﬁxed number of training examples every s tep, we can measure the

performance P (θ

, t

, s

) = P

i,j,t

of conﬁguration θ

on task t

after step s

, yielding a

learning curve across the time steps s

. Learning curves are used extensively to speed

up hyperparameter optimization on a given task (Kohavi and J oh n, 1995; Provost et al.,

1999; Swersky et al., 2014; Chandrashekaran and Lane, 2017). In meta-learning, however,

learning curve information is transferred across tasks.

While evaluating a conﬁguration on new task t

new

, we can halt the training after a

certain numb er of iterations r < t, and use the partially observed learning curve to predict

how well the conﬁguration will perform on the full dataset based on prior experience with

other tasks, and decide whether to continue the training or not. This can signiﬁcantly speed

up the search for good conﬁgurations.

One approach is to assume that similar tasks yield sim ilar learning curves. First, deﬁne

a distance between tasks based on how similar the partial learning cu rves are: dist(t

, t

) =

f (P

i,a,t

, P

i,b,t

) with t = 1, ..., r. Next, ﬁnd the k most s im ilar tasks t

1..k

and use their

complete learning curves to predict how well the conﬁguration will perform on the new

complete dataset. Task similarity can be measured by comparing the shapes of the partial

curves across all conﬁgurations tried, and the prediction is made by adapting the ‘nearest’

complete curve(s) to the new partial curve (Leite and Brazdil, 2005, 2007). This approach

was also successful in combination with active testing (Leite and Br azdil, 2010), and can

be sped up further by using multi-objective evaluation measures that include training time

(van Rij n et al., 2015).

Interestingly, while several methods aim to predict learning curves during neural archi-

tecture search (Elsken et al., 2018), as of yet none of this work leverages learning curves

previously observed on other tasks.

3. Learning from Task Properties

Another rich sour ce of meta-data are characterizations (meta-features) of the task at hand.

Each task t

∈ T is described with a vector m (t

) = (m

j,1

, ..., m

j,K

) of K meta-featur es

j,k

∈ M, the s et of all known meta-features. Th is can be used to deﬁn e a task similarity

measure based on, for instance, the Euclidean distance between m(t

) and m(t

), so that

we can transfer information from the most similar tasks to the new task t

new

. Moreover,

together with prior evaluations P, we can train a meta-learner L to predict the performance

i,new

of conﬁgurations θ

on a new task t

new

3.1 Meta-Features

Table 1 provides a concise overview of the most commonly used meta-features, together with

a short rationale for why they are indicative of model performance. Where possible, we also

show the formulas to compute them. More complete surveys can be found in the literature

(Rivolli et al., 2018; Vans choren, 2010; Mantovani, 2018; Reif et al., 2014; Castiello et al.,

2005).

剩余28页未读，继续阅读

小牧户

粉丝: 0
资源: 13

元学习综述：数据驱动的快速学习策略

Jmeter学习概要.pdf

交换学习笔记.概要.pdf

模数电元器件综合大全.概要.pdf

如何利用Matlab实现OFDM系统的信道估计以及峰均功率比(PAPR)降低？请结合《新加坡南洋理工OFDM教程：Chapter_6_OFDM_Notes.pdf概要》进行说明。

libevent参考手册中文版_libevent-2.1.5.pdf

如何阅读芯片数据手册(中文版).pdf

实验语音学概要 pdf

软件工程第三版齐治昌.pdf

实变函数与泛函分析概要pdf

数据结构教程第五版学习指导pdf

最新资源