can be implemented in two ways: i) passing the input im-age
through an encoder neural network (e.g. the Variational Auto-
Encoder [21]); ii) optimizing a random initial latent code to
match the input image [41, 7]. Between them, the first
approach dominated for a long time. Although it has an
inherent problem to generalize beyond the training dataset, it
produces higher quality results than the naive latent code
optimization methods [41, 7]. While recently, Abdal et al. [1]
obtained excellent embedding results by optimizing the latent
codes in an enhanced W
+
latent space instead of the initial Z
latent space. Their method suggests a new direc-tion for
various image editing applications and makes the second
approach interesting again.
可以通过 两种方式实现: i)通过编 码器神经网 络(例如
Variational Auto-Encoder [21])传递输入图像; ii)优化随机
初始潜在代码以匹配输入图像[41,7]。在他们之间,第一
种方法占主导地位很长时间。虽然它有一个固有的问题,
泛化超出了训练数据集,它产生了更高的质量结果比幼
稚的潜在代码优化方法[41,7]。最近,Abdal 等[1]通过在
增强的 w + 潜在空间中优化潜在码而不是在初始的 z 潜
在空间中优化潜在码,得到了很好的嵌入结果。他们的
方法为各种图像编辑应用程序提供了一个新的方向,并
使第二种方法再次变得有趣。
Activation Tensor Manipulation. With fixed neural net-
work weights, the expression power of a generator can be
fully utilized by manipulating its activation tensors. Based on
this observation, Bau [4] et al. investigated what a GAN can
and cannot generate by locating and manipulating rel-evant
neurons in the activation tensors [4, 5]. Built on the
understanding of how an object is “drawn” by the genera-tor,
they further designed a semantic image editing system that
can add, remove or change the appearance of an object in an
input image [3]. Concurrently, Fruhst¨uck¨ et al. [11]
investigated the potential of activation tensor manipulation in
image blending. Observing that boundary artifacts can be
eliminated by by cropping and combining activation tensors
at early layers of a generator, they proposed an algorithm to
create large-scale texture maps of hundreds of megapixels by
combining outputs of GANs trained on a lower resolu-tion.
激活张量操作。使用固定的神经网络权重,通过操纵激
活张量,可以充分利用发生器的表达能力。基于这一观
察,Bau [4]等人通过定位和操纵激活张量中的相关神经
元来研究 GAN 能够和不能产生什么[4,5]。在理解生成器
如何“绘制”对象的基础上,他们进一步设计了一个语义
图像编辑系统,可以添加、删除或改变输入图像中对象
的外观[3]。同时,Fruhst uck 等[11]研究了激活张量操作
在图像混合中的潜力。他们观察到边界伪影可以通过裁
剪和合并生成器早期层的激活张量来消除,他们提出了
一种算法,通过合并受过较低分辨率训练的 GANs 的输
出来创建数百万像素的大规模纹理映射。
3. Overview
3. 概览
Our paper is structured as follows. First, we describe an
extended version of the Image2StyleGAN [1] embedding
algorithm (See Sec. 4). We propose two novel modifica-tions:
1) to enable local edits, we integrate various spatial masks
into the optimization framework. Spatial masks en-able
embeddings of incomplete images with missing values and
embeddings of images with approximate color values such as
user scribbles. In addition to spatial masks, we ex-plore layer
masks that restrict the embedding into a set of selected layers.
The early layers of StyleGAN [19] encode content and the
later layers control the style of the image. By restricting
embeddings into a subset of layers we can better control what
attributes of a given image are extracted.
我们的论文结构如下。首先,我们描述了
Image2StyleGAN [1]嵌入算法的扩展版本(见第 4 节)。我
们提出了两个新的修改: 1)使局部编辑成为可能,我们将
各种空间掩模整合到优化框架中。空间掩码能够嵌入缺
失值的不完整图像和具有近似颜色值的图像,如用户涂
鸦。除了空间蒙版,我们还探索了层蒙版,它限制了嵌
入到一组选定的图层中。StyleGAN [19]的早期层对内容
进行编码,后面的层控制图像的样式。通过限制嵌入到
一个子集的图层,我们可以更好地控制什么属性的给定
图像被提取。
2) to further improve the embedding quality, we optimize
for an additional group of variables n that control additive
noise maps. These noise maps encode high frequency de-
tails and enable embedding with very high reconstruction
quality.
为了进一步提高嵌入质量,我们优化了一组额外的变
量 n 控制加性噪声映射。这些噪声图对高频细节进行
编码,并使嵌入具有非常高的重建质量。
Second, we explore multiple operations to directly ma-
nipulate activation tensors (See Sec. 5). We mainly explore
其次,我们探索多种操作直接操纵激活张量(见第 5
节)。我们主要探索
8297
8297