Image2StyleGAN++：灵活图像编辑框架

需积分: 0 56 浏览量更新于2024-06-30 收藏 2.06MB PDF 举报

"Image2StyleGAN++是一种灵活的图像编辑框架，用于多种应用。该框架在Image2StyleGAN的基础上进行了三个方面的扩展：噪声优化、全局W+局部W*空间的结合以及多模态编辑。噪声优化能恢复图像的高频特征，显著提升重构图像的质量。此外，通过全球和局部的W*空间，用户可以更加精确地控制编辑结果。" 在本文中，研究者提出了Image2StyleGAN++，这是一个创新的图像编辑工具，旨在提供更大的灵活性和多样性，适用于多种图像编辑任务。该框架是基于先前的Image2StyleGAN模型进行改进的，主要包含了以下三个关键增强点：首先，引入了噪声优化作为W+潜空间嵌入的补充。在图像处理中，高频特征通常包含丰富的细节，如纹理和边缘。传统的W+潜空间方法可能无法完全恢复这些特征，导致重构图像质量下降。噪声优化算法则能够有效地恢复丢失的高频信息，从而显著提高重构图像的保真度，例如将PSNR（峰值信噪比）从20dB提升至45dB，这意味着图像的质量有了显著提升。其次，Image2StyleGAN++扩展了全局W+与局部W*空间的使用。在StyleGAN系列模型中，W+空间允许对生成图像的整体风格进行控制，而W*空间则聚焦于更具体的局部特性。通过结合这两个空间，用户可以实现对图像全局和局部特征的精细化编辑，比如改变特定区域的形状、颜色或纹理，而不影响其他部分。最后，框架支持多模态编辑，这意味着可以生成具有不同样式和内容的编辑结果，为用户提供更广泛的选择。这在创意编辑、艺术创作或修复破损图像等应用场景中具有巨大价值。图1展示了Image2StyleGAN++的编辑效果对比。图(a)和(b)是输入图像，(c)是简单地将(a)的左半部分复制到(b)的右半部分生成的“双面”图像，这种方法不能保持原有的细节和质量。而(d)则是使用Image2StyleGAN++框架生成的“双面”图像，可以看出，它成功地结合了两个输入图像的特征，并且保持了较高的图像质量。 Image2StyleGAN++不仅提供了先进的图像编辑功能，而且通过噪声优化和W+、W*空间的联合使用，使得编辑后的图像质量大幅提升，增强了用户的创作自由度，对于嵌入式硬件平台上的图像处理应用具有重大意义。

can be implemented in two ways: i) passing the input im-age

through an encoder neural network (e.g. the Variational Auto-

Encoder [21]); ii) optimizing a random initial latent code to

match the input image [41, 7]. Between them, the first

approach dominated for a long time. Although it has an

inherent problem to generalize beyond the training dataset, it

produces higher quality results than the naive latent code

optimization methods [41, 7]. While recently, Abdal et al. [1]

obtained excellent embedding results by optimizing the latent

codes in an enhanced W

latent space instead of the initial Z

latent space. Their method suggests a new direc-tion for

various image editing applications and makes the second

approach interesting again.

可以通过两种方式实现: i)通过编码器神经网络(例如

Variational Auto-Encoder [21])传递输入图像; ii)优化随机

初始潜在代码以匹配输入图像[41,7]。在他们之间，第一

种方法占主导地位很长时间。虽然它有一个固有的问题，

泛化超出了训练数据集，它产生了更高的质量结果比幼

稚的潜在代码优化方法[41,7]。最近，Abdal 等[1]通过在

增强的 w + 潜在空间中优化潜在码而不是在初始的 z 潜

在空间中优化潜在码，得到了很好的嵌入结果。他们的

方法为各种图像编辑应用程序提供了一个新的方向，并

使第二种方法再次变得有趣。

Activation Tensor Manipulation. With fixed neural net-

work weights, the expression power of a generator can be

fully utilized by manipulating its activation tensors. Based on

this observation, Bau [4] et al. investigated what a GAN can

and cannot generate by locating and manipulating rel-evant

neurons in the activation tensors [4, 5]. Built on the

understanding of how an object is “drawn” by the genera-tor,

they further designed a semantic image editing system that

can add, remove or change the appearance of an object in an

input image [3]. Concurrently, Fruhst¨uck¨ et al. [11]

investigated the potential of activation tensor manipulation in

image blending. Observing that boundary artifacts can be

eliminated by by cropping and combining activation tensors

at early layers of a generator, they proposed an algorithm to

create large-scale texture maps of hundreds of megapixels by

combining outputs of GANs trained on a lower resolu-tion.

激活张量操作。使用固定的神经网络权重，通过操纵激

活张量，可以充分利用发生器的表达能力。基于这一观

察，Bau [4]等人通过定位和操纵激活张量中的相关神经

元来研究 GAN 能够和不能产生什么[4,5]。在理解生成器

如何“绘制”对象的基础上，他们进一步设计了一个语义

图像编辑系统，可以添加、删除或改变输入图像中对象

的外观[3]。同时，Fruhst uck 等[11]研究了激活张量操作

在图像混合中的潜力。他们观察到边界伪影可以通过裁

剪和合并生成器早期层的激活张量来消除，他们提出了

一种算法，通过合并受过较低分辨率训练的 GANs 的输

出来创建数百万像素的大规模纹理映射。

3. Overview

3. 概览

Our paper is structured as follows. First, we describe an

extended version of the Image2StyleGAN [1] embedding

algorithm (See Sec. 4). We propose two novel modifica-tions:

1) to enable local edits, we integrate various spatial masks

into the optimization framework. Spatial masks en-able

embeddings of incomplete images with missing values and

embeddings of images with approximate color values such as

user scribbles. In addition to spatial masks, we ex-plore layer

masks that restrict the embedding into a set of selected layers.

The early layers of StyleGAN [19] encode content and the

later layers control the style of the image. By restricting

embeddings into a subset of layers we can better control what

attributes of a given image are extracted.

我们的论文结构如下。首先，我们描述了

Image2StyleGAN [1]嵌入算法的扩展版本(见第 4 节)。我

们提出了两个新的修改: 1)使局部编辑成为可能，我们将

各种空间掩模整合到优化框架中。空间掩码能够嵌入缺

失值的不完整图像和具有近似颜色值的图像，如用户涂

鸦。除了空间蒙版，我们还探索了层蒙版，它限制了嵌

入到一组选定的图层中。StyleGAN [19]的早期层对内容

进行编码，后面的层控制图像的样式。通过限制嵌入到

一个子集的图层，我们可以更好地控制什么属性的给定

图像被提取。

2) to further improve the embedding quality, we optimize

for an additional group of variables n that control additive

noise maps. These noise maps encode high frequency de-

tails and enable embedding with very high reconstruction

quality.

为了进一步提高嵌入质量，我们优化了一组额外的变

量 n 控制加性噪声映射。这些噪声图对高频细节进行

编码，并使嵌入具有非常高的重建质量。

Second, we explore multiple operations to directly ma-

nipulate activation tensors (See Sec. 5). We mainly explore

其次，我们探索多种操作直接操纵激活张量(见第 5

节)。我们主要探索

8297

剩余19页未读，继续阅读

Msura

粉丝: 843

Image2StyleGAN++：灵活图像编辑框架

stylegan2-projecting-images:使用StyleGAN2将图像投影到潜在空间

Image2Image_Style_transfer_pytorch:通过pytorch中的GAN将白天变成黑夜

Unpaired Image-to-Image Translation using Cycle-consistent adversarial networks

pixel2style2pixel:“样式编码”的官方实现

GAN学习笔记

Python-精彩的GAN应用和示例列表

Pixel2Style2Pixel: 探索StyleGAN图像翻译的新框架

Transformer-Based Style Transformer for Image Inversion and Editing

探索图像转换：研究生使用CycleGAN等生成模型

改进CycleGAN：短波红外人脸图像向可见光高效转换

最新资源