没有合适的资源?快使用搜索试试~ 我知道了~
首页metalearning-slides.pdf
metalearning-slides.pdf
需积分: 13 97 浏览量
更新于2023-03-16
评论
收藏 19.93MB PDF 举报
metalearningPPT 小样本学习 深度学习 元学习 谷歌大脑 总结全面,很不错的资源 metalearningPPT 小样本学习 深度学习 元学习 谷歌大脑 总结全面,很不错的资源 metalearningPPT 小样本学习 深度学习 元学习 谷歌大脑 总结全面,很不错的资源
资源详情
资源评论
资源推荐

GENERALIZING FROM FEW EXAMPLES
WITH META-LEARNING
Hugo Larochelle
Google Brain

A RESEARCH AGENDA
•
Deep learning successes have required a lot of labeled training data
‣ collecting and labeling such data requires significant human labor
‣ practically, is that really how we’ll solve AI ?
‣ scientifically, this means there is a gap with ability of humans to learn, which we should try to understand
•
Alternative solution : exploit other sources of data that are imperfect but plentiful
‣ unlabeled data (unsupervised learning)
‣ multimodal data (multimodal learning)
‣ multidomain data (transfer learning, domain adaptation)
2

A RESEARCH AGENDA
•
Deep learning successes have required a lot of labeled training data
‣ collecting and labeling such data requires significant human labor
‣ practically, is that really how we’ll solve AI ?
‣ scientifically, this means there is a gap with ability of humans to learn, which we should try to understand
•
Alternative solution : exploit other sources of data that are imperfect but plentiful
‣ unlabeled data (unsupervised learning)
‣ multimodal data (multimodal learning)
‣ multidomain data (transfer learning, domain adaptation)
2

Under review as a conference paper at ICLR 2017
Figure 1: Example of meta-learning setup. The top represents the meta-training set D
metatrain
,
where inside each gray box is a separate dataset that consists of the training set D
train
(left side of
dashed line) and the test set D
test
(right side of dashed line). In this illustration, we are considering
the 1-shot, 5 -class classification task where for each dataset, we have one example from each of
5 classes (each given a label 1-5) in the training set and 2 examples for evaluation in the test set.
The meta-test set D
metatest
is defined in the same way, but with a different set of datasets that
cover classes not present in any of the datasets in D
metatrain
(similarly, we additionally have a
meta-validation set that is used to determine hyper-parameters).
Our key observation that we leverage here is that this update resembles the update for the cell state
in an LSTM (Hochreiter & Schmidhuber, 1997)
c
t
= f
t
c
t1
+ i
t
˜c
t
, (2)
if f
t
=1,c
t1
= ✓
t1
,i
t
= ↵
t
, and ˜c
t
= r
✓
t1
L
t
.
Thus, we propose training a meta-learner LSTM to learn an update rule for training a neural net-
work. We set the cell state of the LSTM to be the parameters of the learner, or c
t
= ✓
t
, and the
candidate cell state ˜c
t
= r
✓
t1
L
t
, given how valuable information about the gradient is for opti-
mization. We define parametric forms for i
t
and f
t
so that the meta-learner can determine optimal
values through the course of the updates.
Let us start with i
t
, which corresponds to the learning rate for the updates. We let
i
t
=
W
I
·
⇥
r
✓
t1
L
t
, L
t
, ✓
t1
,i
t1
⇤
+ b
I
,
meaning that the learning rate is a function of the current parameter value ✓
t1
, the current gradient
r
✓
t1
L
t
, the current loss L
t
, and the previous learning rate i
t1
. With this information, the meta-
learner should be able to finely control the learning rate so as to train the learner quickly while
avoiding divergence.
As for f
t
, it seems possible that the optimal choice isn’t the constant 1. Intuitively, what would
justify shrinking the parameters of the learner and forgetting part of its previous value would be
if the learner is currently in a bad local optima and needs a large change to escape. This would
correspond to a situation where the loss is high but the gradient is close to zero. Thus, one proposal
for the forget gate is to have it be a function of that information, as well as the previous value of the
forget gate:
f
t
=
W
F
·
⇥
r
✓
t1
L
t
, L
t
, ✓
t1
,f
t1
⇤
+ b
F
.
Additionally, notice that we can also learn the initial value of the cell state c
0
for the LSTM, treating
it as a parameter of the meta-learner. This corresponds to the initial weights of the classifier (that
the meta-learner is training). Learning this initial value lets the meta-learner determine the optimal
initial weights of the learner so that training begins from a beneficial starting point that allows
3

RESEARCH ARTICLES
◥
COGNITIVE SCIENCE
Human-level concept learning
through probabilistic
program induction
Brenden M. Lake,
1
* Ruslan Salakhutdinov,
2
Joshua B. Tenenbaum
3
People learning new concepts can often generalize successfully from just a single example,
yet machine learning algorithms typically require tens or hundreds of examples to
perform with similar accuracy. People can also use learned concepts in richer ways than
conventional algorithms—for action, imagination, and explanation. We present a
computational model that captures these human learning abilities for a large class of
simple visual concepts: handwritten characters from the world’salphabets.Themodel
represents concepts as simple programs that best explain observed examples under a
Bayesian criterion. On a challenging one-shot classification task, the model achieves
human-level performance while outperforming recent deep learning approaches. We also
present several “visual Turing tests” probing the model’screativegeneralizationabilities,
which in many cases are indistinguishable from human behavior.
D
espite remarkable advances in artificial
intelligence and machine learning, two
aspects of human conceptual knowledge
have eluded machine systems. First, for
most interesting kinds of natural and man-
made categories, people can learn a new concept
from just one or a handfu l of examples, w he r ea s
standard algorithms in machine learning require
tens or hundreds of examples to perform simi-
larly. For instance, people may only need to see
one example of a novel two-wheeled vehicle
(Fig. 1A) in order to grasp the boundaries of the
new concept, and even children can make mean-
ingful generalizations via “one-shot learning”
(1–3). In contrast, many of the leading approaches
in machine learning are also the most data-hungry,
especially “deep learning” models that have
achieved new levels of performance on object
and speech recognition benchmarks (4–9). Sec-
ond, people learn richer representations than
machines do, even for simple concepts (Fig. 1B),
using them for a wider range of functions, in-
cluding (Fig. 1, ii) creating new exemplars (10),
(Fig. 1, iii) parsing objects into parts and rela-
tions (11), and (Fig. 1, iv) creating new abstract
categories of objects based on existing categ ories
(12, 13). In contrast, the best machine classifiers
do not perform these additional functions, which
are rarely studied and usually require special-
ized algorithms. A central challenge is to ex-
plain these two aspects of human-level concept
learning: How do people learn new concepts
from just one or a few examples? And how do
people learn su ch abstrac t, ric h, a nd fle xible rep-
resentations? An even greater challenge arises
when putting them together: How can learning
succeed from such sparse data yet also produce
such rich representations? For any theory of
RESEARCH
1332 11 DECEMBER 2015 • VOL 350 ISSUE 6266 sciencemag.org SCIENCE
1
Center for Data Science, New York University, 726
Broadway, New York, NY 10003, USA.
2
Department of
Computer Science and Department of Statistics, University
of Toronto, 6 King’s College Road, Toronto, ON M5S 3G4,
Canada.
3
Department of Brain and Cognitive Sciences,
Massachusetts Institute of Technology, 77 Massachusetts
Avenue, Cambridge, MA 02139, USA.
*Corresponding author. E-mail: brenden@nyu.edu
Fig. 1. People can learn rich concepts from limited data. (A and B)Asingleexampleofanewconcept(redboxes)canbeenoughinformationtosupport
the (i) classification of new e xampl es, (ii) gener ation of new e xamples, (iii) pa r si ng an object into pa rt s and r elations (part s se gmented b y color ), and (iv)
generation of new concepts from related concepts. [Image credit for (A), iv , bottom: With permiss ion fr om Glenn R oberts and Mo tor cycle Mo jo Mag azine ]
on December 10, 2015www.sciencemag.orgDownloaded from on December 10, 2015www.sciencemag.orgDownloaded from on December 10, 2015www.sciencemag.orgDownloaded from on December 10, 2015www.sciencemag.orgDownloaded from on December 10, 2015www.sciencemag.orgDownloaded from on December 10, 2015www.sciencemag.orgDownloaded from on December 10, 2015www.sciencemag.orgDownloaded from
4
People are !
good at it
剩余63页未读,继续阅读

















安全验证
文档复制为VIP权益,开通VIP直接复制

评论0