深度学习中的元学习：概念、应用与挑战

需积分: 43 194 浏览量更新于2024-07-09 1 收藏 822KB PDF 举报

"这篇综述文章深入探讨了神经网络中的元学习，强调了近年来该领域关注度的显著增长。元学习，或称为学习如何学习，旨在通过多种学习经历优化学习算法，区别于传统的从零开始解决问题的AI方法。元学习提供了解决深度学习中数据瓶颈、计算限制以及泛化能力问题的新途径。" 元学习是一种人工智能学习策略，其核心思想是让机器不仅能够从特定任务中学习，而且能从一系列任务中学习并提取出通用的学习策略。这样的策略有助于快速适应新任务，尤其是在数据量有限的情况下，如所谓的"少样本学习"。在文章中，作者首先定义了元学习的概念，并将其与转移学习和超参数优化等相关的学习方法进行了对比。转移学习侧重于将一个任务的知识转移到另一个相关任务，而超参数优化则关注找到最佳的模型设置。元学习则更进一步，它试图调整学习过程本身，以适应不断变化的任务环境。接着，文章提出了一种新的分类体系，这个分类系统全面地划分了当前的元学习方法。这可能包括基于模型的方法，其中学习算法被视为可学习的参数；基于记忆的方法，它们利用对先前任务的记忆来加速新任务的学习；以及优化为基础的方法，这些方法改进了学习算法的更新规则。此外，元学习已经在诸如少样本学习和强化学习等领域展现出潜力。在少样本学习中，元学习允许模型在仅接触少量示例的情况下就能有效地学习新类别。在强化学习中，元学习可以加速智能体的探索过程，使其更快地掌握环境的动态规律。最后，文章讨论了元学习面临的主要挑战，例如如何处理大规模和复杂任务，以及如何确保学习策略的泛化性。同时，作者也指出了未来研究的潜在方向，比如更好地理解元学习的内在机制，以及开发更加高效和灵活的元学习算法。这篇综述为理解元学习的理论基础、应用范围和未来发展提供了全面的视角，对于从事深度学习和人工智能研究的人员具有重要的参考价值。

a source model to be robust to such domain-shift without

further adaptation. Many knowledge transfer methods have

been studied [34], [58] to boost target domain performance.

However, as for TL, vanilla DA and DG don’t use a meta-

objective to optimize ‘how to learn’ across domains. Mean-

while, meta-learning methods can be used to perform both

DA [59] and DG [42] (see Sec. 5.8).

Continual learning (CL) Continual or lifelong learning

[60]–[62] refers to the ability to learn on a sequence of tasks

drawn from a potentially non-stationary distribution, and

in particular seek to do so while accelerating learning new

tasks and without forgetting old tasks. Similarly to meta-

learning, a task distribution is considered, and the goal is

partly to accelerate learning of a target task. However most

continual learning methodologies are not meta-learning

methodologies since this meta objective is not solved for

explicitly. Nevertheless, meta-learning provides a potential

framework to advance continual learning, and a few recent

studies have begun to do so by developing meta-objectives

that encode continual learning performance [63]–[65].

Multi-Task Learning (MTL) aims to jointly learn sev-

eral related tasks, to beneﬁt from regularization due to

parameter sharing and the diversity of the resulting shared

representation [66]–[68], as well as compute/memory sav-

ings. Like TL, DA, and CL, conventional MTL is a single-

level optimization without a meta-objective. Furthermore,

the goal of MTL is to solve a ﬁxed number of known tasks,

whereas the point of meta-learning is often to solve unseen

future tasks. Nonetheless, meta-learning can be brought in

to beneﬁt MTL, e.g. by learning the relatedness between

tasks [69], or how to prioritise among multiple tasks [70].

Hyperparameter Optimization (HO) is within the remit

of meta-learning, in that hyperparameters like learning rate

or regularization strength describe ‘how to learn’. Here we

include HO tasks that deﬁne a meta objective that is trained

end-to-end with neural networks, such as gradient-based

hyperparameter learning [69], [71] and neural architecture

search [18]. But we exclude other approaches like random

search [72] and Bayesian Hyperparameter Optimization

[73], which are rarely considered to be meta-learning.

Hierarchical Bayesian Models (HBM) involve Bayesian

learning of parameters θ under a prior p(θ|ω). The prior

is written as a conditional density on some other variable

ω which has its own prior p(ω). Hierarchical Bayesian

models feature strongly as models for grouped data D =

|i = 1, 2, . . . , M}, where each group i has its own

. The full model is

i=1

p(D

|θ

)p(θ

|ω)

p(ω). The lev-

els of hierarchy can be increased further; in particular ω

can itself be parameterized, and hence p(ω) can be learnt.

Learning is usually full-pipeline, but using some form of

Bayesian marginalisation to compute the posterior over

ω: P (ω|D) ∼ p(ω)

i=1

dθ

p(D

|θ

)p(θ

|ω). The ease of

doing the marginalisation depends on the model: in some

(e.g. Latent Dirichlet Allocation [74]) the marginalisation is

exact due to the choice of conjugate exponential models,

in others (see e.g. [75]), a stochastic variational approach is

used to calculate an approximate posterior, from which a

lower bound to the marginal likelihood is computed.

Bayesian hierarchical models provide a valuable view-

point for meta-learning, by providing a modeling rather

than an algorithmic framework for understanding the meta-

learning process. In practice, prior work in HBMs has typi-

cally focused on learning simple tractable models θ while

most meta-learning work considers complex inner-loop

learning processes, involving many iterations. Nonetheless,

some meta-learning methods like MAML [16] can be under-

stood through the lens of HBMs [76].

AutoML: AutoML [31]–[33] is a rather broad umbrella

for approaches aiming to automate parts of the machine

learning process that are typically manual, such as data

preparation, algorithm selection, hyper-parameter tuning,

and architecture search. AutoML often makes use of numer-

ous heuristics outside the scope of meta-learning as deﬁned

here, and focuses on tasks such as data cleaning that are

less central to meta-learning. However, AutoML sometimes

makes use of end-to-end optimization of a meta-objective,

so meta-learning can be seen as a specialization of AutoML.

3 TAXONOMY

3.1 Previous Taxonomies

Previous [77], [78] categorizations of meta-learning meth-

ods have tended to produce a three-way taxonomy across

optimization-based methods, model-based (or black box)

methods, and metric-based (or non-parametric) methods.

Optimization Optimization-based methods include those

where the inner-level task (Eq. 6) is literally solved as

an optimization problem, and focuses on extracting meta-

knowledge ω required to improve optimization perfor-

mance. A famous example is MAML [16], which aims to

learn the initialization ω = θ

, such that a small number

of inner steps produces a classiﬁer that performs well on

validation data. This is also performed by gradient descent,

differentiating through the updates of the base model. More

elaborate alternatives also learn step sizes [79], [80] or

train recurrent networks to predict steps from gradients

[19], [39], [81]. Meta-optimization by gradient over long

inner optimizations leads to several compute and memory

challenges which are discussed in Section 6. A uniﬁed view

of gradient-based meta learning expressing many existing

methods as special cases of a generalized inner loop meta-

learning framework has been proposed [82].

Black Box / Model-based In model-based (or black-box)

methods the inner learning step (Eq. 6, Eq. 4) is wrapped up

in the feed-forward pass of a single model, as illustrated

in Eq. 7. The model embeds the current dataset D into

activation state, with predictions for test data being made

based on this state. Typical architectures include recurrent

networks [39], [51], convolutional networks [38] or hyper-

networks [83], [84] that embed training instances and labels

of a given task to deﬁne a predictor for test samples. In this

case all the inner-level learning is contained in the activation

states of the model and is entirely feed-forward. Outer-

level learning is performed with ω containing the CNN,

RNN or hypernetwork parameters. The outer and inner-

level optimizations are tightly coupled as ω and D directly

specify θ. Memory-augmented neural networks [85] use an

explicit storage buffer and can be seen as a model-based

剩余19页未读，继续阅读

syp_net

粉丝: 158
资源: 1187

深度学习中的元学习：概念、应用与挑战

元学习与图神经网络逻辑推导（55页ppt）

神经网络的持续终身学习综述论文

论文研究-神经网络系统辨识法综述 .pdf

神经网络优化学习算法综述.doc

《神经网络与深度学习综述DeepLearning15May2014.pdf

本科毕业论文、深层神经网络多任务学习综述的源码及相关材料.zip

图卷积神经网络中的池化综述.pdf

BP神经网络及深度学习研究综述.pdf

深度学习算法中卷积神经网络的概念综述

神经元综述_神经元；神经网络；尖峰神经元_

最新资源