HiFaceGAN: Face Renovation via
Collaborative Suppression and Replenishment
Lingbo Yang
∗
Shanshe Wang
Siwei Ma
†
Wen Gao
Institute of Digital Media, Peking
University, Beijing, China
{lingbo,sswang,swma,wgao}@pku.edu.cn
Chang Liu
∗
University of Chinese
Academy of Sciences
Beijing, China
liuchang615@mails.ucas.ac.cn
Pan Wang
Peiran Ren
DAMO Academy, Alibaba Group
Hangzhou, China
{dixian.wp,peiran.rpr}@alibaba-
inc.com
ABSTRACT
Existing face restoration researches typically rely on either the
image degradation prior or explicit guidance labels for training,
which often lead to limited generalization ability over real-world
images with heterogeneous degradation and rich background con-
tents. In this paper, we investigate a more challenging and practical
“dual-blind” version of the problem by lifting the requirements
on both types of prior, termed as “Face Renovation”(FR). Speci-
cally, we formulate FR as a semantic-guided generation problem
and tackle it with a collaborative suppression and replenishment
(CSR) approach. This leads to HiFaceGAN, a multi-stage frame-
work containing several nested CSR units that progressively re-
plenish facial details based on the hierarchical semantic guidance
extracted from the front-end content-adaptive suppression mod-
ules. Extensive experiments on both synthetic and real face im-
ages have veried the superior performance of our HiFaceGAN
over a wide range of challenging restoration subtasks, demon-
strating its versatility, robustness and generalization ability to-
wards real-world face processing applications. Code is available at
https://github.com/Lotayou/Face-Renovation.
CCS CONCEPTS
• Computing methodologies → Computer vision.
KEYWORDS
Face Renovation, image synthesis, collaborative learning
ACM Reference Format:
Lingbo Yang, Shanshe Wang, Siwei Ma, Wen Gao, Chang Liu, Pan Wang,
and Peiran Ren. 2020. HiFaceGAN: Face Renovation via Collaborative Sup-
pression and Replenishment. In 28th ACM International Conference on Mul-
timedia (MM ’20), October 12–16, 2020, Seattle, WA, USA.. ACM, New York,
NY, USA, 10 pages. https://doi.org/10.1145/3394171.3413965
∗
Equal contribution. Authors are ordered horizontally, just ignore the above format.
†
Corresponding author.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
MM ’20, October 12–16, 2020, Seattle, WA, USA.
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7988-5/20/10.. . $15.00
https://doi.org/10.1145/3394171.3413965
1 INTRODUCTION
Face photographs record long-lasting precious memories of in-
dividuals and historical moments of human civilization. Yet the
limited conditions in the acquisition, storage, and transmission of
images inevitably involve complex, heterogeneous degradations in
real-world scenarios, including discrete sampling, additive noise,
lossy compression, and beyond. With great application and research
value, face restoration has been widely concerned by industry and
academia, as a plethora of works [
41
][
48
][
37
] devoted to address
specic types of image degradation. Yet it still remains a challenge
towards more generalized, unconstrained application scenarios,
where few works can report satisfactory restoration results.
For face restoration, most existing methods typically work in
a “non-blind” fashion with specic degradation of prescribed type
and intensity, leading to a variety of sub-tasks including super
resolution[
58
][
8
][
34
][
55
], hallucination [
47
][
29
], denoising[
1
][
60
],
deblurring [
48
][
25
][
26
] and compression artifact removal [
37
][
7
][
39
].
However, task-specic methods typically exhibit poor generaliza-
tion over real-world images with complex and heterogeneous degra-
dations. A case in point shown in Fig. 1 is a historic group photo-
graph taken at the Solvay Conference, 1927, that super-resolution
methods, ESRGAN [
55
] and Super-FAN [
4
], tend to introduce addi-
tional artifacts, while other three task-specic restoration methods
barely make any dierence in suppressing degradation artifacts or
replenishing ne details of hair textures, wrinkles, etc., revealing
the impracticality of task-specic restoration methods.
When it comes to blind image restoration [
43
], researchers aim
to recover high-quality images from their degraded observation
in a “single-blind” manner without a priori knowledge about the
type and intensity of the degradation. It is often challenging to re-
construct image contents from artifacts without degradation prior,
necessitating additional guidance information such as categorial [
2
]
or structural prior [
5
] to facilitate the replenishment of faithful and
photo-realistic details. For blind face restoration [
35
][
6
], facial land-
marks [
4
], parsing maps [
53
], and component heatmaps [
59
] are
typically utilized as external guidance labels. In particular, Li et.al.
explored the guided face restoration problem [
31
][
30
], where an
additional high-quality face is utilized to promote ne-grained de-
tail replenishment. However, it often leads to limited feasibility for
restoring photographs without ground truth annotations. Further-
more, for real-world images with complex background, introducing
unnecessary guidance could lead to inconsistency between the
quality of renovated faces and unattended background contents.
arXiv:2005.05005v2 [cs.CV] 22 May 2021