Learning Efficient Convolutional Networks through Network Slimming
Zhuang Liu
1∗
Jianguo Li
2
Zhiqiang Shen
3
Gao Huang
4
Shoumeng Yan
2
Changshui Zhang
1
1
Tsinghua University
2
Intel Labs China
3
Fudan University
4
Cornell University
{liuzhuangthu, zhiqiangshen0214}@gmail.com, {jianguo.li, shoumeng.yan}@intel.com,
gh349@cornell.edu, zcs@mail.tsinghua.edu.cn
Abstract
The deployment of deep convolutional neural networks
(CNNs) in many real world applications is largely hindered
by their high computational cost. In this paper, we propose
a novel learning scheme for CNNs to simultaneously 1) re-
duce the model size; 2) decrease the run-time memory foot-
print; and 3) lower the number of computing operations,
without compromising accuracy. This is achieved by en-
forcing channel-level sparsity in the network in a simple but
effective way. Different from many existing approaches, the
proposed method directly applies to modern CNN architec-
tures, introduces minimum overhead to the training process,
and requires no special software/hardware accelerators for
the resulting models. We call our approach network slim-
ming, which takes wide and large networks as input mod-
els, but during training insignificant channels are automat-
ically identified and pruned afterwards, yielding thin and
compact models with comparable accuracy. We empirically
demonstrate the effectiveness of our approach with several
state-of-the-art CNN models, including VGGNet, ResNet
and DenseNet, on various image classification datasets. For
VGGNet, a multi-pass version of network slimming gives a
20× reduction in model size and a 5× reduction in comput-
ing operations.
1. Introduction
In recent years, convolutional neural networks (CNNs)
have become the dominant approach for a variety of com-
puter vision tasks, e.g., image classification [22], object
detection [8], semantic segmentation [26]. Large-scale
datasets, high-end modern GPUs and new network architec-
tures allow the development of unprecedented large CNN
models. For instance, from AlexNet [22], VGGNet [31] and
GoogleNet [34] to ResNets [14], the ImageNet Classifica-
tion Challenge winner models have evolved from 8 layers
to more than 100 layers.
∗
This work was done when Zhuang Liu and Zhiqiang Shen were interns
at Intel Labs China. Jianguo Li is the corresponding author.
However, larger CNNs, although with stronger represen-
tation power, are more resource-hungry. For instance, a
152-layer ResNet [14] has more than 60 million parame-
ters and requires more than 20 Giga float-point-operations
(FLOPs) when inferencing an image with resolution 224×
224. This is unlikely to be affordable on resource con-
strained platforms such as mobile devices, wearables or In-
ternet of Things (IoT) devices.
The deployment of CNNs in real world applications are
mostly constrained by 1) Model size: CNNs’ strong repre-
sentation power comes from their millions of trainable pa-
rameters. Those parameters, along with network structure
information, need to be stored on disk and loaded into mem-
ory during inference time. As an example, storing a typi-
cal CNN trained on ImageNet consumes more than 300MB
space, which is a big resource burden to embedded devices.
2) Run-time memory: During inference time, the interme-
diate activations/responses of CNNs could even take more
memory space than storing the model parameters, even with
batch size 1. This is not a problem for high-end GPUs, but
unaffordable for many applications with low computational
power. 3) Number of computing operations: The convolu-
tion operations are computationally intensive on high reso-
lution images. A large CNN may take several minutes to
process one single image on a mobile device, making it un-
realistic to be adopted for real applications.
Many works have been proposed to compress large
CNNs or directly learn more efficient CNN models for fast
inference. These include low-rank approximation [7], net-
work quantization [3, 12] and binarization [28, 6], weight
pruning [12], dynamic inference [16], etc. However, most
of these methods can only address one or two challenges
mentioned above. Moreover, some of the techniques require
specially designed software/hardware accelerators for exe-
cution speedup [28, 6, 12].
Another direction to reduce the resource consumption of
large CNNs is to sparsify the network. Sparsity can be im-
posed on different level of structures [2, 37, 35, 29, 25],
which yields considerable model-size compression and in-
ference speedup. However, these approaches generally re-
arXiv:1708.06519v1 [cs.CV] 22 Aug 2017