SinGAN：从单张图像学习生成模型

需积分: 5 72 浏览量更新于2024-08-03 收藏 9.68MB PDF 举报

"SinGAN是一种深度学习模型，专为从单个自然图像中学习生成模型而设计。该模型由Tamar Rott Shaham等人提出，它能够捕捉图像内部的补丁分布，并生成高质量、多样化的样本，这些样本具有与原始图像相同的视觉内容。SinGAN采用多尺度对抗性训练策略，由一系列全卷积GAN构成的金字塔结构，每个GAN负责学习图像在不同尺度上的补丁分布，从而可以生成任意大小的新样本。" 在当前的计算机视觉和图像生成领域，SinGAN（Single Image GAN）是一个创新性的深度学习模型，它打破了传统的生成对抗网络（GANs）需要大量数据进行训练的限制。SinGAN的核心思想是通过单一的训练图像来学习图像的内在分布，然后生成与原始图像风格一致的新图像样本。文献《SinGAN: Learning a Generative Model from a Single Natural Image》详细阐述了SinGAN的架构和训练过程。模型采用了一个自底向上的金字塔结构，每个层级对应一个全卷积的GAN，每个GAN专注于学习图像在特定尺度上的纹理和模式。这种分层学习方法使得模型能捕捉到从细节到全局的各种特征。在训练过程中，SinGAN使用逐级对抗性损失函数，确保生成的图像在每个尺度上都与原始图像的局部统计特性相匹配。此外，模型还引入了随机采样策略，使生成的样本在保持原有风格的同时，能创造出新的对象配置和结构，增加了多样性。 SinGAN的应用范围广泛，包括图像修复、风格迁移、分辨率提升等。它可以用于恢复损坏或低质量的图像，或者将低分辨率图像转换为高分辨率。此外，SinGAN还可以用于创造艺术作品，例如，根据单张照片生成不同的艺术风格图像，或者对现有图像进行创意修改。 SinGAN是深度学习领域的一个重要突破，它提供了一种新的方法，能够在数据有限的情况下，有效地学习和生成逼真的图像。这对于那些难以获取大量训练数据的场景，如罕见事件的模拟或者对特定环境的虚拟再现，具有重要的实用价值。

RealFake

Effective

Patch Size

Mult-scale Patch

Discriminator

Mult-scale Patch

Generator

Training Progression

Figure 4: SinGAN’s multi-scale pipeline. Our model consists of a pyramid of GANs, where both training and inference are

done in a coarse-to-ﬁne fashion. At each scale, G

learns to generate image samples in which all the overlapping patches

cannot be distinguished from the patches in the down-sampled training image, x

, by the discriminator D

; the effective

patch size decreases as we go up the pyramid (marked in yellow on the original image for illustration). The input to G

is a

random noise image z

, and the generated image from the previous scale ˜x

, upsampled to the current resolution (except for

the coarsest level which is purely generative). The generation process at level n involves all generators {G

. . . G

} and all

noise maps {z

, . . . , z

} up to this level. See more details at Sec. 2.

…

Figure 5: Single scale generation. At each scale n, the im-

age from the previous scale, ˜x

n+1

, is upsampled and added

to the input noise map, z

. The result is fed into 5 conv

layers, whose output is a residual image that is added back

to (˜x

n+1

) ↑

. This is the output ˜x

of G

sider a different source of training data – all the overlapping

patches at multiple scales of a single natural image. We

show that a powerful generative model can be learned from

this data, and can be used in a number of image manipula-

tion tasks.

2. Method

Our goal is to learn an unconditional generative model

that captures the internal statistics of a single training im-

age x. This task is conceptually similar to the conven-

tional GAN setting, except that here the training samples

are patches of a single image, rather than whole image sam-

ples from a database.

We opt to go beyond texture generation, and to deal

with more general natural images. This requires capturing

the statistics of complex image structures at many different

scales. For example, we want to capture global properties

such as the arrangement and shape of large objects in the

image (e.g. sky at the top, ground at the bottom), as well

as ﬁne details and texture information. To achieve that, our

generative framework, illustrated in Fig. 4, consists of a hi-

erarchy of patch-GANs (Markovian discriminator) [31, 26],

where each is responsible for capturing the patch distribu-

tion at a different scale of x. The GANs have small recep-

tive ﬁelds and limited capacity, preventing them from mem-

orizing the single image. While similar multi-scale archi-

tectures have been explored in conventional GAN settings

(e.g. [28, 52, 29, 52, 13, 24]), we are the ﬁrst explore it for

internal learning from a single image.

2.1. Multi-scale architecture

Our model consists of a pyramid of generators,

, . . . , G

}, trained against an image pyramid of x:

, . . . , x

}, where x

is a downsampled version of x by

a factor r

, for some r > 1. Each generator G

is responsi-

ble of producing realistic image samples w.r.t. the patch dis-

tribution in the corresponding image x

. This is achieved

through adversarial training, where G

learns to fool an as-

sociated discriminator D

, which attempts to distinguish

patches in the generated samples from patches in x

The generation of an image sample starts at the coarsest

scale and sequentially passes through all generators up to

the ﬁnest scale, with noise injected at every scale. All the

generators and discriminators have the same receptive ﬁeld

and thus capture structures of decreasing size as we go up

the generation process. At the coarsest scale, the generation

is purely generative, i.e. G

maps spatial white Gaussian

noise z

to an image sample ˜x

4571

Authorized licensed use limited to: GUILIN UNIVERSITY OF ELECTRONIC TECHNOLOGY. Downloaded on March 21,2023 at 05:32:04 UTC from IEEE Xplore. Restrictions apply.

剩余10页未读，继续阅读

m0_46285064

粉丝: 47
资源: 5

SinGAN：从单张图像学习生成模型

SinGAN代码-tamarott

用于医学图像分割的SinGAN-Seg综合训练数据生成_SinGAN-Seg Synthetic Training Data G

如何将SinGAn的思想和TreeGAN的生成方式结合起来

如何将SinGAn的思想和TreeGAN的生成方式结合起来应用到点云生成上

SinGAn如何进行判别生成的图像和真实给定的图像

写出在SinGAN中加入CBAM注意力机制的代码

paint-to-image

给我推荐20个比较流行的AI作画模型源码

生成对抗网络的国内外研究现状

给我推荐20个比较流行的AI作画模型

最新资源