EVERYONE IS A CARTOONIST: SELFIE CARTOONIZATION
WITH ATTENTIVE ADVERSARIAL NETWORKS
Xinyu Li, Wei Zhang, Tong Shen, Tao Mei
JD AI Research, Beijing, China
lixinyu6@jd.com, wzhang.cu@gmail.com, shentong7@jd.com, tmei@live.com
ABSTRACT
Selfie and cartoon are two popular artistic forms that are
widely presented in our daily life. Despite the great progress
in image translation/stylization, few techniques focus specif-
ically on selfie cartoonization, since cartoon images usually
contain artistic abstraction (e.g., large smoothing areas) and
exaggeration (e.g., large/delicate eyebrows). In this paper,
we address this problem by proposing a selfie cartooniza-
tion Generative Adversarial Network (scGAN), which mainly
uses an attentive adversarial network (AAN) to emphasize
specific facial regions and ignore low-level details. More
specifically, we first design a cycle-like architecture to enable
training with unpaired data. Then we design three losses from
different aspects. A total variation loss is used to highlight im-
portant edges and contents in cartoon portraits. An attentive
cycle loss is added to lay more emphasis on delicate facial
areas such as eyes. In addition, a perceptual loss is included
to eliminate artifacts and improve robustness of our method.
Experimental results show that our method is capable of gen-
erating different cartoon styles and outperforms a number of
state-of-the-art methods.
Index Terms— Self cartoonization, generative adversar-
ial network, attention mechanism, image translation
1. INTRODUCTION
Selfie cartoonization as an artistic form is in great demand
in our daily life. The most common is served as a profile in
social networks, which can catch one’s attention at once in
such a humorous way and protect individual privacy simul-
taneously. In addition, the cartoon portraits are also widely
used in online role-playing games, artistic poster designs and
so on. However, as shown in Fig. 1, manually drawing a car-
toon portrait is very laborious and involves substantial artistic
skills even with photo editing software. Thus, how to make
selfie cartoonization efficient and high-quality is an important
question.
Existing methods attempt to do various painting styles
for cartoon portrait generation. Traditional image process-
ing methods based on sketch extracting [1] with little post-
processing on colors or shapes have been widely applied in
Fig. 1. This figure presents some cartoon portrait artworks in
different styles drawn by painters. It usually takes 2 to 3 days
for such a complex manual creative process.
smartphone software. These approaches often need to de-
sign dedicated algorithms for specific styles, and the quality
of these synthetic results is far from satisfactory on a fine-
grained level. The recent emergence of deep convolutional
neural networks [2] has provided some attractive solutions for
domain transfer. Neural Style Transfer (NST) [3] is one of
them, which is able to transfer the artistic style of an image
to a target image while keep the content of the target image.
Since NST is designed for general cases, it lacks the ability to
focus on some special area for tasks such as cartoonization.
There are another family of methods based on Generative Ad-
versarial Networks (GAN) [4] that perform domain transfer in
an adversarial manner. Some image2image translation meth-
ods (e.g., pix2pix [5], Bicycle [6]) are proposed to map the
image from one domain to another. However, these meth-
ods require paired images, which are difficult to obtain for
many tasks. Thanks to a series of unsupervised domain trans-
fer frameworks proposed (e.g., CycleGAN [7], UNIT [8]),
we are able to train a model with unpaired data. There are
already some existing methods (e.g., CartoonGAN [9], DA-
GAN [10]) performing cartoonization based on unsupervised
GAN framework, but they usually fail to capture the delicate
facial parts or generate pleasing results.
According to our observation and analysis, we can find
there exist three challenges in producing acceptable quality
cartoon portraits. First, there are no public paired datasets on
human selfies and cartoon portraits. Second, we do not know
arXiv:1904.12615v1 [cs.CV] 20 Apr 2019