没有合适的资源?快使用搜索试试~ 我知道了~
首页最新《小样本学习(Few-shot learning)》2020综述论文【香港科技大学】.pdf
资源详情
资源评论
资源推荐

1
Generalizing from a Few Examples: A Survey on Few-Shot
Learning
YAQING WANG, Hong Kong University of Science and Technology and Baidu Research
QUANMING YAO
∗
, 4Paradigm Inc.
JAMES T. KWOK, Hong Kong University of Science and Technology
LIONEL M. NI, Hong Kong University of Science and Technology
Machine learning has been highly successful in data-intensive applications, but is often hampered when the
data set is small. Recently, Few-Shot Learning (FSL) is proposed to tackle this problem. Using prior knowledge,
FSL can rapidly generalize to new tasks containing only a few samples with supervised information. In this
paper, we conduct a thorough survey to fully understand FSL. Starting from a formal denition of FSL, we
distinguish FSL from several relevant machine learning problems. We then point out that the core issue in
FSL is that the empirical risk minimizer is unreliable. Based on how prior knowledge can be used to handle
this core issue, we categorize FSL methods from three perspectives: (i) data, which uses prior knowledge to
augment the supervised experience; (ii) model, which uses prior knowledge to reduce the size of the hypothesis
space; and (iii) algorithm, which uses prior knowledge to alter the search for the best hypothesis in the given
hypothesis space. With this taxonomy, we review and discuss the pros and cons of each category. Promising
directions, in the aspects of the FSL problem setups, techniques, applications and theories, are also proposed
to provide insights for future research.
1
CCS Concepts:
• Computing methodologies → Articial intelligence
;
Machine learning
;
Learning
paradigms.
Additional Key Words and Phrases: Few-Shot Learning, One-Shot Learning, Low-Shot Learning, Small Sample
Learning, Meta-Learning, Prior Knowledge
ACM Reference Format:
Yaqing Wang, Quanming Yao, James T. Kwok, and Lionel M. Ni. 2020. Generalizing from a Few Examples: A
Survey on Few-Shot Learning. ACM Comput. Surv. 1, 1, Article 1 (March 2020), 34 pages. https://doi.org/10.
1145/3386252
1 INTRODUCTION
“Can machines think?” This is the question raised in Alan Turing’s seminal paper entitled “Comput-
ing Machinery and Intelligence” [
134
] in 1950. He made the statement that “The idea behind digital
∗
Corresponding Author
1
A list of references, which will be updated periodically, can be found at https:// github.com/tata1661/FewShotPapers.git.
Authors’ addresses: Yaqing Wang, ywangcy@connect.ust.hk, Department of Computer Science and Engineering, Hong
Kong University of Science and Technology, Business Intelligence Lab and National Engineering Laboratory of Deep
Learning Technology and Application, Baidu Research; Quanming Yao, yaoquanming@4paradigm.com, 4Paradigm Inc.;
James T. Kwok, jamesk@cse.ust.hk, Department of Computer Science and Engineering, Hong Kong University of Science
and Technology; Lionel M. Ni, ni@ust.hk, Department of Computer Science and Engineering, Hong Kong University of
Science and Technology.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and
the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specic permission and/or a fee. Request permissions from permissions@acm.org.
© 2020 Association for Computing Machinery.
0360-0300/2020/3-ART1 $15.00
https://doi.org/10.1145/3386252
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: March 2020.
arXiv:1904.05046v3 [cs.LG] 29 Mar 2020

1:2 Yaqing Wang, anming Yao, James T. Kwok, and Lionel M. Ni
computers may be explained by saying that these machines are intended to carry out any opera-
tions which could be done by a human computer”. In other words, the ultimate goal of machines
is to be as intelligent as humans. In recent years, due to the emergence of powerful computing
devices (e.g., GPU and distributed platforms), large data sets (e.g., ImageNet data with 1000 classes
[
30
]), advanced models and algorithms (e.g., convolutional neural networks (CNN) [
73
] and long
short-term memory (LSTM) [
58
]), AI speeds up its pace to be like humans and defeats humans
in many elds. To name a few, AlphaGo [
120
] defeats human champions in the ancient game of
Go; and residual network (ResNet) [
55
] obtains better classication performance than humans on
ImageNet. AI also supports the development of intelligent tools in many aspects of daily life, such
as voice assistants, search engines, autonomous driving cars, and industrial robots.
Albeit its prosperity, current AI techniques cannot rapidly generalize from a few examples.
The aforementioned successful AI applications rely on learning from large-scale data. In contrast,
humans are capable of learning new tasks rapidly by utilizing what they learned in the past. For
example, a child who learned how to add can rapidly transfer his knowledge to learn multiplication
given a few examples (e.g., 2
×
3
=
2
+
2
+
2 and 1
×
3
=
1
+
1
+
1). Another example is that given a
few photos of a stranger, a child can easily identify the same person from a large number of photos.
Bridging this gap between AI and humans is an important direction. It can be tackled by machine
learning, which is concerned with the question of how to construct computer programs that
automatically improve with experience [
92
,
94
]. In order to learn from a limited number of examples
with supervised information, a new machine learning paradigm called Few-Shot Learning (FSL)
[
35
,
36
] is proposed. A typical example is character generation [
76
], in which computer programs
are asked to parse and generate new handwritten characters given a few examples. To handle this
task, one can decompose the characters into smaller parts transferable across characters, and then
aggregate these smaller components into new characters. This is a way of learning like human [
77
].
Naturally, FSL can also advance robotics [
26
], which develops machines that can replicate human
actions. Examples include one-shot imitation [
147
], multi-armed bandits [
33
], visual navigation
[37], and continuous control [156].
Another classic FSL scenario is where examples with supervised information are hard or im-
possible to acquire due to privacy, safety or ethic issues. A typical example is drug discovery,
which tries to discover properties of new molecules so as to identify useful ones as new drugs [
4
].
Due to possible toxicity, low activity, and low solubility, new molecules do not have many real
biological records on clinical candidates. Hence, it is important to learn eectively from a small
number of samples. Similar examples where the target tasks do not have many examples include
FSL translation [
65
], and cold-start item recommendation [
137
]. Through FSL, learning suitable
models for these rare cases can become possible.
FSL can also help relieve the burden of collecting large-scale supervised data. For example,
although ResNet [
55
] outperforms humans on ImageNet, each class needs to have sucient labeled
images which can be laborious to collect. FSL can reduce the data gathering eort for data-intensive
applications. Examples include image classication [
138
], image retrieval [
130
], object tracking [
14
],
gesture recognition [
102
], image captioning, visual question answering [
31
], video event detection
[151], language modeling [138], and neural architecture search [19].
Driven by the academic goal for AI to approach humans and the industrial demand for inexpensive
learning, FSL has drawn much recent attention and is now a hot topic. Many related machine
learning approaches have been proposed, such as meta-learning [
37
,
106
,
114
], embedding learning
[
14
,
126
,
138
] and generative modeling [
34
,
35
,
113
]. However, currently, there is no work that
provides an organized taxonomy to connect these FSL methods, explains why some methods work
while others fail, nor discusses the pros and cons of dierent approaches. Therefore, in this paper,
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: March 2020.

Generalizing from a Few Examples: A Survey on Few-Shot Learning 1:3
we conduct a survey on the FSL problem. In contrast, the survey in [
118
] only focuses on concept
learning and experience learning for small samples.
Contributions of this survey can be summarized as follows:
•
We give a formal denition on FSL, which naturally connects to the classic machine learning
denition in [
92
,
94
]. The denition is not only general enough to include existing FSL works,
but also specic enough to clarify what the goal of FSL is and how we can solve it. This
denition is helpful for setting future research targets in the FSL area.
•
We list the relevant learning problems for FSL with concrete examples, clarifying their
relatedness and dierences with respect to FSL. These discussions can help better discriminate
and position FSL among various learning problems.
•
We point out that the core issue of FSL supervised learning problem is the unreliable empirical
risk minimizer, which is analyzed based on error decomposition [
17
] in machine learning.
This provides insights to improve FSL methods in a more organized and systematic way.
•
We perform an extensive literature review, and organize them in an unied taxonomy
from the perspectives of data, model and algorithm. We also present a summary of insights
and a discussion on the pros and cons of each category. These can help establish a better
understanding of FSL methods.
•
We propose promising future directions for FSL in the aspects of problem setup, techniques,
applications and theories. These insights are based on the weaknesses of the current develop-
ment of FSL, with possible improvements to make in the future.
1.1 Organization of the Survey
The remainder of this survey is organized as follows. Section 2 provides an overview for FSL,
including its formal denition, relevant learning problems, core issue, and a taxonomy of existing
works in terms of data, model and algorithm. Section 3 is for methods that augment data to solve
FSL problem. Section 4 is for methods that reduce the size of hypothesis space so as to make FSL
feasible. Section 5 is for methods that alter the search strategy of algorithm to deal with the FSL
problem. In Section 6, we propose future directions for FSL in terms of problem setup, techniques,
applications and theories. Finally, the survey closes with conclusion in Section 7.
1.2 Notation and Terminology
Consider a learning task
T
, FSL deals with a data set
D = {D
train
, D
test
}
consisting of a training set
D
train
= {(x
i
, y
i
)}
I
i=1
where
I
is small, and a testing set
D
test
= {x
test
}
. Let
p(x, y)
be the ground-truth
joint probability distribution of input
x
and output
y
, and
ˆ
h
be the optimal hypothesis from
x
to
y
. FSL learns to discover
ˆ
h
by tting
D
train
and testing on
D
test
. To approximate
ˆ
h
, the FSL model
determines a hypothesis space
H
of hypotheses
h(·
;
θ)
’s, where
θ
denotes all the parameters used
by
h
. Here, a parametric
h
is used, as a nonparametric model often requires large data sets, and
thus not suitable for FSL. A FSL algorithm is an optimization strategy that searches
H
in order to
nd the
θ
that parameterizes the best
h
∗
∈ H
. The FSL performance is measured by a loss function
ℓ(
ˆ
y, y) dened over the prediction
ˆ
y = h(x; θ ) and the observed output y.
2 OVERVIEW
In this section, we rst provide a formal denition of the FSL problem in Section 2.1 with concrete
examples. To dierentiate the FSL problem from relevant machine learning problems, we discuss
their relatedness and dierences in Section 2.2. In Section 2.3, we discuss the core issue that makes
FSL dicult. Section 2.4 then presents a unied taxonomy according to how existing works handle
the core issue.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: March 2020.

1:4 Yaqing Wang, anming Yao, James T. Kwok, and Lionel M. Ni
2.1 Problem Definition
As FSL is a sub-area in machine learning, before giving the denition of FSL, let us recall how
machine learning is dened in the literature.
Denition 2.1 (
Machine Learning
[
92
,
94
]). A computer program is said to learn from experience
E
with respect to some classes of task
T
and performance measure
P
if its performance can improve
with E on T measured by P.
For example, consider an image classication task (
T
), a machine learning program can improve
its classication accuracy (
P
) through
E
obtained by training on a large number of labeled images
(e.g., the ImageNet data set [
73
]). Another example is the recent computer program AlphaGo [
120
],
which has defeated the human champion in playing the ancient game of Go (
T
). It improves its
winning rate (
P
) against opponents by training on a database (
E
) of around 30 million recorded
moves of human experts as well as playing against itself repeatedly. These are summarized in
Table 1.
Table 1. Examples of machine learning problems based on Definition 2.1.
task T
experience E
performance P
image classication [73] large-scale labeled images for each class
classication
accuracy
the ancient game of Go [120]
a database containing around 30 million recorded moves
of human experts and self-play records
winning rate
Typical machine learning applications, as in the examples mentioned above, require a lot of
examples with supervised information. However, as mentioned in the introduction, this may be
dicult or even not possible. FSL is a special case of machine learning, which targets at obtaining
good learning performance given limited supervised information provided in the training set
D
train
,
which consists of examples of inputs
x
i
’s along with their corresponding output
y
i
’s [
15
]. Formally,
we dene FSL in Denition 2.2.
Denition 2.2.
Few-Shot Learning
(FSL) is a type of machine learning problems (specied by
E
,
T
and
P
), where
E
contains only a limited number of examples with supervised information for the
target T .
Existing FSL problems are mainly supervised learning problems. Concretely, few-shot classication
learns classiers given only a few labeled examples of each class. Example applications include
image classication [
138
], sentiment classication from short text [
157
] and object recognition
[
35
]. Formally, using notations from Section 1.2, few-shot classication learns a classier
h
which
predicts label
y
i
for each input
x
i
. Usually, one considers the
N
-way-
K
-shot classication [
37
,
138
],
in which
D
train
contains
I = KN
examples from
N
classes each with
K
examples. Few-shot regression
[
37
,
156
] estimates a regression function
h
given only a few input-output example pairs sampled
from that function, where output
y
i
is the observed value of the dependent variable
y
, and
x
i
is
the input which records the observed value of the independent variable
x
. Apart from few-shot
supervised learning, another instantiation of FSL is few-shot reinforcement learning [3, 33], which
targets at nding a policy given only a few trajectories consisting of state-action pairs.
We now show three typical scenarios of FSL (Table 2):
•
Acting as a test bed for learning like human. To move towards human intelligence, it is vital
that computer programs can solve the FSL problem. A popular task (
T
) is to generate samples
of a new character given only a few examples [
76
]. Inspired by how humans learn, the
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: March 2020.

Generalizing from a Few Examples: A Survey on Few-Shot Learning 1:5
computer programs learn with the
E
consisting of both the given examples with supervised
information and pre-trained concepts such as parts and relations as prior knowledge. The
generated characters are evaluated through the pass rate of visual Turing test (
P
), which
discriminates whether the images are generated by humans or machines. With this prior
knowledge, computer programs can also learn to classify, parse and generate new handwritten
characters with a few examples like humans.
•
Learning for rare cases. When obtaining sucient examples with supervised information
is hard or impossible, FSL can learn models for the rare cases. For example, consider a
drug discovery task (
T
) which tries to predict whether a new molecule has toxic eects [
4
].
The percentage of molecules correctly assigned as toxic or non-toxic (
P
) improves with
E
obtained by both the new molecule’s limited assay, and many similar molecules’ assays as
prior knowledge.
•
Reducing data gathering eort and computational cost . FSL can help relieve the burden of
collecting large number of examples with supervised information. Consider few-shot image
classication task (
T
) [
35
]. The image classication accuracy (
P
) improves with the
E
obtained
by a few labeled images for each class of the target
T
, and prior knowledge extracted from
the other classes (such as raw images to co-training). Methods succeed in this task usually
have higher generality. Therefore, they can be easily applied for tasks of many samples.
Table 2. Three FSL examples based on Definition 2.2.
task T
experience E
performance P
supervised information prior knowledge
character generation [76]
a few examples of new
character
pre-learned knowledge of
parts and relations
pass rate of visual
Turing test
drug toxicity discovery [4]
new molecule’s limited
assay
similar molecules’ assays
classication
accuracy
image classication [70]
a few labeled images for
each class of the target T
raw images of other classes,
or pre-trained models
classication
accuracy
In comparison to Table 1, Table 2 has one extra column under “experience
E
" which is marked
as “prior knowledge". As
E
only contains a few examples with supervised information directly
related to
T
, it is natural that common supervised learning approaches often fail on FSL problems.
Therefore, FSL methods make the learning of target
T
feasible by combining the available supervised
information in
E
with some prior knowledge, which is “any information the learner has about the
unknown function before seeing the examples" [
86
]. One typical type of FSL methods is Bayesian
learning [
35
,
76
]. It combines the provided training set
D
train
with some prior probability distribution
which is available before D
train
is given [15].
Remark 1. When there is only one example with supervised information in
E
, FSL is called
one-shot
learning
[
14
,
35
,
138
]. When
E
does not contain any example with super vised information for the
target
T
, FSL becomes a
zero-shot learning
problem (ZSL) [
78
]. As the target class does not contain
examples with supervised information, ZSL requires
E
to contain information from other modalities
(such as attributes, WordNet, and word embeddings used in rare object recognition tasks), so as to
transfer some supervised information and make learning possible.
2.2 Relevant Learning Problems
In this section, we discuss some relevant machine learning problems. The relatedness and dierence
with respect to FSL are claried.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: March 2020.
剩余33页未读,继续阅读














安全验证
文档复制为VIP权益,开通VIP直接复制

评论1