
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan
1
Quoc V. Le
1
Abstract
Convolutional Neural Networks (ConvNets) are
commonly developed at a fixed resource budget,
and then scaled up for better accuracy if more
resources are available. In this paper, we sys-
tematically study model scaling and identify that
carefully balancing network depth, width, and res-
olution can lead to better performance. Based
on this observation, we propose a new scaling
method that uniformly scales all dimensions of
depth/width/resolution using a simple yet highly
effective compound coefficient. We demonstrate
the effectiveness of this method on scaling up
MobileNets and ResNet.
To go even further, we use neural architecture
search to design a new baseline network and
scale it up to obtain a family of models, called
EfficientNets, which achieve much better accu-
racy and efficiency than previous ConvNets. In
particular, our EfficientNet-B7 achieves state-
of-the-art 84.4% top-1 / 97.1% top-5 accuracy
on ImageNet, while being
8.4x smaller
and
6.1x faster
on inference than the best existing
ConvNet. Our EfficientNets also transfer well and
achieve state-of-the-art accuracy on CIFAR-100
(91.7%), Flowers (98.8%), and 3 other transfer
learning datasets, with an order of magnitude
fewer parameters. Source code is at
https:
//github.com/tensorflow/tpu/tree/
master/models/official/efficientnet.
1. Introduction
Scaling up ConvNets is widely used to achieve better accu-
racy. For example, ResNet (He et al., 2016) can be scaled
up from ResNet-18 to ResNet-200 by using more layers;
Recently, GPipe (Huang et al., 2018) achieved 84.3% Ima-
geNet top-1 accuracy by scaling up a baseline model four
1
Google Research, Brain Team, Mountain View, CA. Corre-
spondence to: Mingxing Tan <tanmingxing@google.com>.
Proceedings of the
36
th
International Conference on Machine
Learning, Long Beach, California, PMLR 97, 2019.
0 20 40 60 80 100 120 140 160 180
Number of Parameters (Millions)
74
76
78
80
82
84
Imagenet Top-1 Accuracy (%)
ResNet-34
ResNet-50
ResNet-152
DenseNet-201
Inception-v2
Inception-ResNet-v2
NASNet-A
NASNet-A
ResNeXt-101
Xception
AmoebaNet-A
AmoebaNet-C
SENet
B0
B3
B4
B5
B6
EfficientNet-B7
Top1 Acc. #Params
ResNet-152 (He et al., 2016) 77.8% 60M
EfficientNet-B1 79.2% 7.8M
ResNeXt-101 (Xie et al., 2017) 80.9% 84M
EfficientNet-B3 81.7% 12M
SENet (Hu et al., 2018) 82.7% 146M
NASNet-A (Zoph et al., 2018) 82.7% 89M
EfficientNet-B4 83.0% 19M
GPipe (Huang et al., 2018)
†
84.3% 556M
EfficientNet-B7 84.4% 66M
†
Not plotted
Figure 1. Model Size vs. ImageNet Accuracy.
All numbers are
for single-crop, single-model. Our EfficientNets significantly out-
perform other ConvNets. In particular, EfficientNet-B7 achieves
new state-of-the-art 84.4% top-1 accuracy but being 8.4x smaller
and 6.1x faster than GPipe. EfficientNet-B1 is 7.6x smaller and
5.7x faster than ResNet-152. Details are in Table 2 and 4.
time larger. However, the process of scaling up ConvNets
has never been well understood and there are currently many
ways to do it. The most common way is to scale up Con-
vNets by their depth (He et al., 2016) or width (Zagoruyko &
Komodakis, 2016). Another less common, but increasingly
popular, method is to scale up models by image resolution
(Huang et al., 2018). In previous work, it is common to scale
only one of the three dimensions – depth, width, and image
size. Though it is possible to scale two or three dimensions
arbitrarily, arbitrary scaling requires tedious manual tuning
and still often yields sub-optimal accuracy and efficiency.
In this paper, we want to study and rethink the process
of scaling up ConvNets. In particular, we investigate the
central question: is there a principled method to scale up
ConvNets that can achieve better accuracy and efficiency?
Our empirical study shows that it is critical to balance all
dimensions of network width/depth/resolution, and surpris-
ingly such balance can be achieved by simply scaling each
of them with constant ratio. Based on this observation, we
propose a simple yet effective compound scaling method.
Unlike conventional practice that arbitrary scales these fac-
tors, our method uniformly scales network width, depth,
arXiv:1905.11946v3 [cs.LG] 23 Nov 2019