COMPACT CONVOLUTIONAL NEURAL NETWORK TRANSFER LEARNING FOR
SMALL-SCALE IMAGE CLASSIFICATION
Zengxi Li
?
Yan Song
?
Ian Mcloughlin
†
Lirong Dai
?
?
National Engineering Laboratory of Speech and Language Information Processing, USTC
†
School of Computing, University of Kent
ABSTRACT
Transfer learning methods have demonstrated state-of-the-
art performance on various small-scale image classification
tasks. This is generally achieved by exploiting the infor-
mation from an ImageNet convolution neural network (Im-
ageNet CNN). However, the transferred CNN model is gen-
erally with high computational complexity and storage re-
quirement. It raises the issue for real-world applications,
especially for some portable devices like phones and tablets
without high-performance GPUs. Several approximation
methods have been proposed to reduce the complexity by
reconstructing the linear or non-linear filters (responses) in
convolutional layers with a series of small ones.
In this paper, we present a compact CNN transfer learn-
ing method for small-scale image classification. Specifically,
it can be decomposed into fine-tuning and joint learning
stages. In fine-tuning stage, a high-performance target CNN
is trained by transferring information from the ImageNet
CNN. In joint learning stage, a compact target CNN is opti-
mized based on ground-truth labels, jointly with the predic-
tions of the high-performance target CNN. The experimental
results on CIFAR-10 and MIT Indoor Scene demonstrate the
effectiveness and efficiency of our proposed method.
Index Terms— CNN, Transfer Learning, Image Classifi-
cation
1. INTRODUCTION
Recently, deep convolutional neural networks (CNN) have
achieved outstanding performance in large scale visual recog-
nition competitions [1]. Generally, the deep CNN structure
can be decomposed into (1) convolutional layers, which per-
form non-linear feature extraction via convolution, rectified
linear units (ReLU), and max-pooling operations, and (2)
fully connected layers, which map the extracted features into
posterior probabilities. It is known that the powerful model-
ing capability of deep CNN mainly comes from its complex
We acknowledge the support of the following organizations for re-
search funding: National Nature Science Foundation of China (Grant No.
61273264 and No. 61172158), Science and Technology Department of
Anhui Province (Grant No. 15CZZ02007), Chinese Academy of Sciences
(Grant No. XDB02070006).
structure with millions of parameters tuned with large-scale
labeled dataset like ImageNet [2].
However, for small-scale datasets, e.g. MIT Indoor
Scene [3], the complexly structured CNN may be prone to
over-fitting, leading to reduced performance. In such cases,
several recent works indicate that it is preferable to transfer
a previous well-trained CNN rather than to train a new CNN
with limited labeled data. For example, Razavian et.al. con-
ducted a series of experiments for various recognition tasks
using CNN features as generic image representation [4].
Chatfield et.al. compared the results of using CNNs with var-
ious structures, e.g. CNN-F, CNN-M and CNN-S [5]. In [6],
Girshick et.al showed that CNN fine-tuning scheme can yield
a significant performance boost. In [7], the transferability
of features from different layers has been comprehensively
evaluated. The effectiveness of CNN fine-tuning schemes has
been validated on similar tasks.
Despite the superior performance of transferred CNNs,
the high computational complexity and storage requirement
make it difficult to apply them in real-world systems, espe-
cially for some portable devices, such as mobile phones and
tablets without high-performance GPUs. So, it is of practical
importance to improve CNN efficiency without reducing per-
formance. Several approximation methods were developed to
reconstruct linear filters or responses with a series of smaller
ones [8, 9]. In [10], Zhang et.al proposed to minimize the
reconstruction error of non-linear responses, which is subject
to a low-rank constraint. These methods mostly focus on the
convolutional layers of CNNs.
In this paper, we propose a compact transfer learning
scheme for small-scale recognition tasks, as shown in Fig 1.
Given a pre-trained CNN for source task (i.e. ImageNet),
the transferring process can be decomposed into fine-tuning
and joint learning stages. In the fine-tuning stage, a high-
performance CNN model on the target dataset, such as MIT
Indoor Scene and CIFAR-10, is fine-tuned by transferring
the parameters of internal layers from a pre-trained CNN. In
the joint learning stage, a compact CNN model that satisfies
the complexity and storage requirement is firstly designed,
and then optimized with an objective function which ex-
ploits the information lying in the output probabilities from
the high-performance CNN. This may enforce the compact