Large-Scale Few-Shot Learning: Knowledge Transfer with Class Hierarchy
Aoxue Li
1
Tiange Luo
1
Zhiwu Lu
2∗
Tao Xiang
3
Liwei Wang
1
1
Peking University
2
Renmin University of China
3
Queen Mary University of London
zhiwu.lu@gmail.com t.xiang@qmul.ac.uk
Abstract
Recently, large-scale few-shot learning (FSL) becomes
topical. It is discovered that, for a large-scale FSL problem
with 1,000 classes in the source domain, a strong baseline
emerges, that is, simply training a deep feature embedding
model using the aggregated source classes and perform-
ing nearest neighbor (NN) search using the learned features
on the target classes. The state-of-the-art large-scale FSL
methods struggle to beat this baseline, indicating intrinsic
limitations on scalability. In this paper, we thus propose a
novel large-scale FSL model by exploiting class hierarchy
encoding the semantic relationships between the source and
target classes. Specifically, a deep feature embedding mod-
el is learned to predict class labels for each training sample
at different layers of the hierarchy. Since the target classes
share some of the labels at the top layers of the hierarchy,
more transferable features are obtained even with only the
source class samples for model training. Extensive exper-
iments show that the proposed model significantly outper-
forms not only the NN baseline but also the state-of-the-art
alternatives. Further, we show that the proposed model can
be easily extended to the large-scale zero-shot learning (ZS-
L) problem and also achieves state-of-the-art results.
1. Introduction
In the past five years, the object recognition research
has focused on large-scale recognition problems such as the
ImageNet ILSVRC challenges [34]. Deep neural network
(DNN) based models [37, 42, 12, 41] have achieved super-
human performance on the ILSVRC 1K recognition task.
However, most existing object recognition models, partic-
ularly those DNN based ones, require hundreds of image
samples to be collected for each object class; many of the
object classes are rare and it is very hard to collect sufficien-
t training samples, even with social media. Therefore, it is
highly desirable to develop object recognition models that
require only few training samples/shots per object class.
∗
Corresponding author.
30
35
40
45
50
55
60
1 2 3 4 5
Top-5 Accuracy (%)
K-shot
NN
PPA
LSD
SGM
Figure 1. Comparative results for large-scale FSL on the ImNet
dataset [17]. The top-5 accuracy over target class samples is used
as the evaluation metric. Notations: NN – nearest neighbor (NN)
search performed in a learned feature space using K samples per
target class as the references; SGM – FSL with the squared gradi-
ent magnitude (SGM) loss [11]; PPA – parameter prediction from
activations (PPA) [31]; LSD – large-scale diffusion (LSD) [3].
To overcome this challenge, meta-learning based few-
shot learning (FSL) [4, 19, 35, 10, 31, 30, 44, 5, 40] has
become a hot topic. FSL is inspired by the fact that hu-
man can recognize target visual objects almost effortlessly
with a few samples thanks to the ability to learn to learn
and knowledge transfer. Similarly, in the FSL problem, we
are provided with a set of source classes and a set of target
classes under the setting that: (1) The target classes have no
overlap with the source classes in the label space; (2) Each
source class has sufficient labelled samples, whereas each
target class has only a few labelled samples. FSL thus aims
to transfer knowledge from the source to target classes.
The focus of this work is on the large-scale FSL set-
ting with a large number of source classes provided. This
is very different from the most widely used meta-learning
evaluation benchmarks such as miniImageNet [45] which
contains 64 source classes with 600 samples in each class.
Yet it is more realistic – after all, we have 1,000s of class-
es in ImageNet that we can use, so why not include more
source classes when it comes to FSL? It is noted that a
1