使用条件对抗网络进行图像到图像转换

需积分: 50 74 浏览量更新于2024-07-04 收藏 8.89MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"这篇文档是关于使用条件对抗网络（Conditional Adversarial Networks，简称CGAN）进行图像到图像转换的研究论文。" 在计算机视觉、图像处理和图形学领域，图像到图像的转换是一个广泛存在的问题，例如将黑白图像转换为彩色图像、航拍图像转化为地图视图等。传统的方法通常针对特定应用设计算法，但这种方法的局限性在于无法通用化。论文"Image-to-Image Translation with Conditional Adversarial Networks"提出了一种通用的解决方案——条件对抗网络（CGAN）。条件对抗网络是一种深度学习模型，它结合了生成对抗网络（Generative Adversarial Networks，GAN）和条件神经网络的概念。在CGAN中，有两个主要的组成部分：生成器（Generator）和判别器（Discriminator）。生成器负责从输入图像生成目标图像，而判别器的任务是区分生成的图像与真实图像。通过对抗训练，生成器不断优化其生成能力，以使判别器无法区分生成的图像和真实图像，从而达到高质量的图像转换。论文中展示了CGAN在多个图像转换任务上的应用，如将标签图转换为街景图、边缘图转换为照片、白天图像转换为夜晚图像等。这些结果表明，同一个CGAN架构和目标函数可以适应不同的数据集，具有很好的泛化能力。作者强调，只需要改变训练数据，无需对网络结构或目标函数进行修改，就能处理各种图像转换问题。此外，论文还探讨了CGAN在解决图像到图像转换问题时的一些挑战，如如何保持结构信息的准确性和细节的保真度，以及如何避免模式塌陷等问题。CGAN通过引入条件信息，不仅学习输入图像到输出图像的映射，还能捕捉输入图像和输出图像之间的关系，从而实现更精准的转换。这篇论文为图像处理提供了一个强大的工具，即条件对抗网络，它在各种图像转换任务中展现出良好的性能，并为后续的深度学习研究提供了新的思路和方法。CGAN的应用不仅限于图像转换，还可以扩展到其他领域，如视频处理、语义分割和图像修复等，体现了深度学习在解决复杂视觉问题中的巨大潜力。

资源详情

资源推荐

frequency crispness, in many cases they nonetheless accu-

rately capture the low frequencies. For problems where this

is the case, we do not need an entirely new framework to

enforce correctness at the low frequencies. L1 will already

do.

This motivates restricting the GAN discriminator to only

model high-frequency structure, relying on an L1 term to

force low-frequency correctness (Eqn. 4). In order to model

high-frequencies, it is sufﬁcient to restrict our attention to

the structure in local image patches. Therefore, we design

a discriminator architecture – which we term a PatchGAN

– that only penalizes structure at the scale of patches. This

discriminator tries to classify if each N ×N patch in an im-

age is real or fake. We run this discriminator convolution-

ally across the image, averaging all responses to provide the

ultimate output of D.

In Section 4.4, we demonstrate that N can be much

smaller than the full size of the image and still produce

high quality results. This is advantageous because a smaller

PatchGAN has fewer parameters, runs faster, and can be

applied to arbitrarily large images.

Such a discriminator effectively models the image as a

Markov random ﬁeld, assuming independence between pix-

els separated by more than a patch diameter. This connec-

tion was previously explored in [38], and is also the com-

mon assumption in models of texture [17, 21] and style

[16, 25, 22, 37]. Therefore, our PatchGAN can be under-

stood as a form of texture/style loss.

3.3. Optimization and inference

To optimize our networks, we follow the standard ap-

proach from [24]: we alternate between one gradient de-

scent step on D, then one step on G. As suggested in

the original GAN paper, rather than training G to mini-

mize log(1 − D(x, G(x, z)), we instead train to maximize

log D(x, G(x, z)) [24]. In addition, we divide the objec-

tive by 2 while optimizing D, which slows down the rate at

which D learns relative to G. We use minibatch SGD and

apply the Adam solver [32], with a learning rate of 0.0002,

and momentum parameters β

= 0.5, β

= 0.999.

At inference time, we run the generator net in exactly

the same manner as during the training phase. This differs

from the usual protocol in that we apply dropout at test time,

and we apply batch normalization [29] using the statistics of

the test batch, rather than aggregated statistics of the train-

ing batch. This approach to batch normalization, when the

batch size is set to 1, has been termed “instance normal-

ization” and has been demonstrated to be effective at im-

age generation tasks [54]. In our experiments, we use batch

sizes between 1 and 10 depending on the experiment.

4. Experiments

To explore the generality of conditional GANs, we test

the method on a variety of tasks and datasets, including both

graphics tasks, like photo generation, and vision tasks, like

semantic segmentation:

• Semantic labels↔photo, trained on the Cityscapes

dataset [12].

• Architectural labels →photo, trained on CMP Facades

[45].

• Map↔aerial photo, trained on data scraped from

Google Maps.

• BW→color photos, trained on [51].

• Edges→photo, trained on data from [65] and [60]; bi-

nary edges generated using the HED edge detector [58]

plus postprocessing.

• Sketch→photo: tests edges→photo models on human-

drawn sketches from [19].

• Day→night, trained on [33].

• Thermal→color photos, trained on data from [27].

• Photo with missing pixels→inpainted photo, trained

on Paris StreetView from [14].

Details of training on each of these datasets are provided

in the supplemental materials online. In all cases, the in-

put and output are simply 1-3 channel images. Qualita-

tive results are shown in Figures 8, 9, 11, 10, 13, 14, 15,

16, 17, 18, 19, 20. Several failure cases are highlighted

in Figure 21. More comprehensive results are available at

https://phillipi.github.io/pix2pix/.

Data requirements and speed We note that decent re-

sults can often be obtained even on small datasets. Our fa-

cade training set consists of just 400 images (see results in

Figure 14), and the day to night training set consists of only

91 unique webcams (see results in Figure 15). On datasets

of this size, training can be very fast: for example, the re-

sults shown in Figure 14 took less than two hours of training

on a single Pascal Titan X GPU. At test time, all models run

in well under a second on this GPU.

4.1. Evaluation metrics

Evaluating the quality of synthesized images is an open

and difﬁcult problem [52]. Traditional metrics such as per-

pixel mean-squared error do not assess joint statistics of the

result, and therefore do not measure the very structure that

structured losses aim to capture.

To more holistically evaluate the visual quality of our re-

sults, we employ two tactics. First, we run “real vs. fake”

perceptual studies on Amazon Mechanical Turk (AMT).

For graphics problems like colorization and photo gener-

ation, plausibility to a human observer is often the ultimate

goal. Therefore, we test our map generation, aerial photo

generation, and image colorization using this approach.

剩余16页未读，继续阅读

TracelessLe

粉丝: 5w+
资源: 466

使用条件对抗网络进行图像到图像转换

Image-to-Image Translation with Conditional Adversarial Nets

Unpaired Image-to-Image Translation

bw2color:给出无类标的黑白图片或视频，在保证语义正确的情况下，利用计算机视觉技术将其转换为彩色

image-to-image translation with conditional adversarial networks

image-to-Image Translation with Conditional Adversarial Networks

输入为128 128 3通道图像，输出128 128 3通道特征图的上下文编码加生成器 定义代码

pix2pix python

详细讲述朱俊彦的pix2pix工作内容

多个角度相片生成模型的代码

人脸老化和退龄预测参考文献

哪里有上下文自编码的生成对抗网络结构图

conditional generative adversarial nets

perl-Module-Load-Conditional

image to image translation

基于生成对抗网络的多输入单输出分类代码

推荐30个以上比较好的命名实体识别github源码？

关于GAN在动画制作方面的应用，给我几篇参开文献

给出Conditional Generative Adversarial Network 的判别器和生成器训练部分代码

最新资源

输入为128 128 3通道图像，输出128 128 3通道特征图的上下文编码加生成器定义代码