MT-net：元学习中的梯度基学习与子空间

需积分: 10 83 浏览量更新于2024-07-15 收藏 939KB PDF 举报

“Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace” 这篇论文探讨的是一个名为MT-net（Meta-Learned Task-aware Network）的新型元学习框架，由Yoonho Lee和Seungjin Choi提出，他们来自韩国浦项工科大学计算机科学与工程系。元学习是一种机器学习方法，其目标是通过从多个相关任务中学习，使得模型能够快速适应新任务，特别是只有少量训练数据的情况下。这篇论文特别关注如何改进基于梯度的元学习方法，以提高其在元测试阶段的表现。传统的基于梯度的元学习方法在元测试时通常采用简单的梯度下降策略。然而，MT-net的创新之处在于它允许模型学习每个层的激活空间作为任务特定的学习者进行梯度下降的子空间。这种设计使得模型能更好地理解任务的特性，而不是简单地应用通用的学习策略。此外，MT-net中的任务特定学习者是在一个由元学习者学习到的距离度量上执行梯度下降，这使得激活空间对任务的身份更加敏感，进一步提升了适应能力。论文表明，这个学习到的子空间的维度能够反映任务特定学习者适应新任务的复杂性。这意味着模型可以根据任务的复杂性动态调整其学习空间，增强了模型的泛化能力。同时，MT-net对初始学习率的选择不那么敏感，这是一个显著的优点，因为合适的初始学习率通常是深度学习模型训练中的一个重要挑战。实验结果表明，MT-net在几个快照分类和回归任务上达到了最先进的性能或相当的水平，这证明了其在实际问题中的有效性。这一成果为元学习领域提供了新的视角，即通过学习层次化的任务特定空间和自适应的度量，可以更高效地进行多任务学习和适应。总结来说，这篇论文的核心贡献是提出了MT-net模型，它结合了层内子空间学习和任务敏感的度量学习，以增强基于梯度的元学习方法的性能。这种方法不仅提高了模型的适应性，还降低了对初始学习率选择的依赖，从而在元学习任务中展现出更强的泛化能力和稳定性。

Figure 2: A diagram of the adaptation process of a Transformation Network (T-net). Blue values are meta-

learned and shared across all tasks. Orange values are different for each task.

cell has the same expressive power as a linear layer. Model parameters

are a collection of

’s and

’s, i.e.,

θ =











, . . . , W

| {z }

, T

, . . . , T

| {z }











Parameters

, which are shared across task-speciﬁc models, are determined by the meta-learner. All task-

speciﬁc learners have the same initial

but update to different values since each uses their corresponding

train set

T ,train

. Thus we denote such (adjusted) parameters for task

W,T

. Though they may look

similar, T denotes tasks whereas T denotes a transformation matrix.

Given a task T sampled from p(T ), each W is adjusted with the gradient update

← W −α∇

(θ

, θ

, D

T ,train

) . (4)

Again,

W,T

is deﬁned as

{

, . . . ,

}

. Using the task-speciﬁc learner

W,T

, the meta-learner improves

itself with the gradient update

θ ← θ − β∇





T ∼p(T )



W,T

, θ

, D

T ,test







. (5)

α > 0 and β > 0 are learning rate hyperparameters. We show our full algorithm in Algorithm 1.

Suppose that we are given a new task

∗

with the training set

∗

,train

. The model parameters

W,T

∗

are computed as (4), where the gradient update starts from the initial value

that was determined by the

meta-learner.

剩余19页未读，继续阅读

liz_lee

粉丝: 70
资源: 36

MT-net：元学习中的梯度基学习与子空间

bayes笔记（课程学习）

前端开源库-filter-gradient

元学习（meta learning）综述论文（2020年）

LeCun-98-Gradient-Based Learning Applied to Document Recognition

Multi-focus image fusion using a binary gradient-based Sharpness criteria:Multifocus image fusion using a binary gradient-based Sharpness criteria-matlab开发

GRADIENT-BASED ACTIVE LEARNING QUERY STRATEGY.pdf

Gradient-Based Learning Applied to Document Recognition

Experimental validation of domain-level gradient-based and hierarchical PCE-based routing and control in heterogeneous optical networks

gradient-based neural dag learning

Gradient-based learning applied to document recognition.pdf

最新资源