MobileStyleGAN: A Lightweight Convolutional Neural Network for
High-Fidelity Image Synthesis
Sergei Belousov
sergei.o.belousov@gmail.com
Abstract
In recent years, the use of Generative Adversarial Net-
works (GANs) has become very popular in generative image
modeling. While style-based GAN architectures yield state-
of-the-art results in high-fidelity image synthesis, computa-
tionally, they are highly complex. In our work, we focus
on the performance optimization of style-based generative
models. We analyze the most computationally hard parts
of StyleGAN2, and propose changes in the generator net-
work to make it possible to deploy style-based generative
networks in the edge devices. We introduce MobileStyle-
GAN architecture, which has x3.5 fewer parameters and is
x9.5 less computationally complex than StyleGAN2, while
providing comparable quality.
1. Introduction
In recent years, high-fidelity image synthesis has signif-
icantly improved by through the use of Generative Adver-
sarial Networks (GANs) [9]. Whereas early work such as
DCGAN [27] could generate images having a resolution up
to 64x64 pixels, modern networks such as BigGAN [3] and
StyleGAN [20, 21, 19] allow the generation of photorealis-
tic images with up to 512x512 and even 1024x1024 pixels.
Although the quality of generative models has significantly
improved, image generation still requires many computa-
tion resources. The high computational complexity makes
it difficult to deploy state-of-the-art generative models to
edge devices.
For example, the StyleGAN2 [21] network allows realis-
tic face images 1024x1024 pixels in size with FID=2.84 for
the FFHQ dataset. It, however, contains 28.27M parameters
and has a computational complexity of 143.15GMAC.
We propose a new lightweight architecture, Mo-
bileStyleGAN, a high-resolution generative model for high-
quality image generation. Taking as a baseline the orig-
inal StyleGAN2 architecture, we revisit computationally
hard parts of this network to create our own lightweight
model that provides comparable quality (Figure 1). The
whole network contains 8.01M parameters, has a compu-
tational complexity of 15.09 GMAC, and provides quality
with FID=12.38 for the FFHQ dataset.
Our main contributions are:
• We introduce an end-to-end wavelet-based convolu-
tional neural network for high-fidelity image synthesis.
• We introduce Depthwise Separable Modulated Convo-
lution as a lightweight version of Modulated Convolu-
tion to decrease computational complexity.
• We introduce a revisited version of the demodulation
mechanism applicable to graph optimizations such as
operation fusion.
• We propose a pipeline based on knowledge distillation
to train our network.
2. Related Work
2.1. StyleGAN
StyleGAN [20] is a modern generative model for high-
resolution image generation. The key aspects of the Style-
GAN network are:
• It uses progressive growing to increase the resolution
gradually.
• It generates images from a fixed value tensor, as op-
posed to generating images from stochastically gener-
ated latent variables as in conventional GANs.
• The stochastically generated latent variables are used
as style vectors through AdaIN [16] at each resolution
after being nonlinearly transformed by an 8-layer neu-
ral network.
StyleGAN2 [21] improves upon StyleGAN by:
• Eliminating droplet modes by normalizing with es-
timated statistics instead of normalizing with actual
statistics such as AdaIN.
• Reducing eye and tooth stagnation by using a hierar-
chical generator with skip connections instead of pro-
gressive growing.
arXiv:2104.04767v1 [cs.CV] 10 Apr 2021