2019-2020 GAN算法与应用深度解析：生成对抗网络综述

需积分: 50 164 浏览量更新于2024-07-16 3 收藏 1.99MB PDF 举报

本文档《Generative Adversarial Networks: Algorithms, Theory, and Applications》是一篇2019-2020年间发布的关于生成对抗网络（GANs）的综述论文。自2014年以来，GANs已经成为一个热门的研究领域，吸引了众多学者的关注，并催生了大量的算法创新。然而，尽管GANs的应用广泛，但缺乏对不同变种之间联系的全面理解和演化历程的深入探讨。论文首先从算法角度出发，详细介绍了大多数GANs的动机、数学表示以及结构。作者系统地阐述了各种GAN方法，如深度学习中的生成器（Generator）和判别器（Discriminator）之间的对抗训练机制，以及如何通过这些模型实现从简单到复杂的图像生成，包括条件GAN、DCGAN（深度卷积GAN）、Wasserstein GAN和CycleGAN等，展示了它们在生成任务中的具体应用。其次，论文着重探讨了GANs的理论问题，涵盖了稳定性、收敛性、训练难度和模式崩溃等关键挑战。研究者们提出了改进策略，如使用更稳定的损失函数、正则化技术以及如何处理训练过程中的不平衡问题，以提升GANs的性能和稳定性。此外，论文还讨论了GANs与其他机器学习技术的结合，如半监督学习、迁移学习和强化学习。通过这些融合，GANs得以在更多场景中发挥作用，例如通过生成合成数据增强半监督学习的样本，或者在跨域图像转换中实现知识迁移。在实际应用方面，文章重点剖析了GANs在图像处理领域的典型应用，包括图像生成、超分辨率、图像修复、风格转换和图像合成等。GANs在这里展现出了强大的创造力和逼真度，极大地推动了计算机视觉和艺术创作的发展。这篇综述论文旨在为读者提供一个全面且深入的视角，帮助理解GANs的基础原理、演变历程、核心理论问题及其在特定领域中的创新应用，是深入研究和了解这一前沿技术的重要参考资料。

JOURNAL OF L

X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6

The training set is given as a set of pairs of corresponding

images {(s

, x

)}, where x

is a natural photo and s

is a

corresponding semantic label map. The ith-layer feature ex-

tractor of discriminator D

is denoted as D

(i)

(from input to

the ith layer of D

). The feature matching loss L

F M

(G, D

)

is:

F M

(G, D

) =

(s,x)

i=1



(i)

(s, x) − D

(i)

(s, G (s))



(23)

where N

is the number of elements in each layer and

T denotes the total number of layers. The ﬁnal objective

function of [157] is

min

max

k=1,2,3

GAN

(G, D

) + λL

F M

(G, D

)). (24)

3.2.3 CycleGAN

Image-to-image translation is a class of graphics and vision

problems where the goal is to learn the mapping between

an output image and an input image using a training

set of aligned image pairs. When paired training data is

available, reference [156] can be used for these image-to-

image translation tasks. However, reference [156] can not be

used for unpaired data (no input/output pairs), which was

well solved by Cycle-consistent GANs (CycleGAN) [53].

CycleGAN is an important progress for unpaired data. It

is proved that cycle-consistency is an upper bound of the

conditional entropy [158]. CycleGAN can be derived as a

special case within the proposed variational inference (VI)

framework [159], naturally establishing its relationship with

approximate Bayesian inference methods.

The basic idea of DiscoGAN [54] and CycleGAN [53]

is nearly the same. Both of them were proposed separately

nearly at the same time. The only difference between Cy-

cleGAN [53] and DualGAN [55] is that DualGAN uses the

loss format advocated by Wasserstein GAN (WGAN) rather

than the sigmoid cross-entropy loss used in CycleGAN.

3.2.4 f-GAN

As we know, Kullback-Leibler (KL) divergence measures the

difference between two given probability distributions. A

large class of assorted divergences are the so called Ali-

Silvey distances, also known as the f-divergences [160].

Given two probability distributions P and Q which have,

respectively, an absolutely continuous density function p

and q with regard to a base measure dx deﬁned on the

domain X, the f -divergence is deﬁned,

(P kQ) =

q (x)f



p (x)

q (x)



dx. (25)

Different choices of f recover popular divergences as special

cases of f -divergence. For example, if f (a) = a log a, f-

divergence becomes KL divergence. The original GANs

[3] is a special case of f-GAN [17] which is based on f-

divergence. The reference [17] shows that any f-divergence

can be used for training GAN. Furthermore, the reference

[17] discusses the advantages of different choices of di-

vergence functions on both the quality of the produced

generative models and training complexity. Im et al. [161]

quantitatively evaluated GANs with divergences proposed

for training. Uehara et al. [162] extend the f-GAN further,

where the f -divergence is directly minimized in the gen-

erator step and the ratio of the distributions of real and

generated data are predicted in the discriminator step.

3.2.5 Integral Probability Metrics (IPMs)

Denoting P the set of all Borel probability measures on a

topological space (M, A). The integral probability metric

(IPM) [163] between two probability distributions P ∈ P

and Q ∈ P is deﬁned as

(P, Q) = sup

f∈F



fdP −

fdQ



, (26)

where F is a class of real-valued bounded measurable

functions on M . Nonparametric density estimation and

convergence rates for GANs under Besov IPM Losses is

discussed in [164]. IPMs include such as RKHS-induced

maximum mean discrepancy (MMD) as well as the Wasser-

stein distance used in Wasserstein GANs (WGAN).

3.2.5.1 Maximum Mean Discrepancy (MMD):

The maximum mean discrepancy (MMD) [165] is a measure

of the difference between two distributions P and Q given

by the supremum over a function space F of differences

between the expectations with regard to two distributions.

The MMD is deﬁned by:

MMD(F, P, Q) =

sup

f∈F

X∼P

[f (X)] − E

Y ∼Q

[f (Y )]) .

(27)

MMD has been used for deep generative models [166]–[168]

and model criticism [169].

3.2.5.2 Wasserstein GAN (WGAN):

WGAN [18] conducted a comprehensive theoretical analysis

of how the Earth Mover (EM) distance behaves in com-

parison with popular probability distances and divergences

such as the total variation (TV) distance, the Kullback-

Leibler (KL) divergence, and the Jensen-Shannon (JS) diver-

gence utilized in the context of learning distributions. The

deﬁnition of the EM distance is

W (p

data

, p

) = inf

γ∈Π(p

data

)

(x,y)∈γ

[kx − yk] , (28)

where Π (p

data

, p

) denotes the set of all joint distributions

γ (x, y) whose marginals are p

data

and p

, respectively.

However, the inﬁmum in (28) is highly intractable. The

reference [18] uses the following equation to approximate

the EM distance

max

w∈W

x∼p

data

(x)

(x)] − E

z∼p

(z)

(G (z))] , (29)

where there is a parameterized family of functions

}

w∈W

that are all K-Lipschitz for some K and f

can

be realized by the discriminator D. When D is optimized,

(29) denotes the approximated EM distance. Then the aim

of G is to minimize (29) to make the generated distribution

as close to the real distribution as possible. Therefore, the

overall objective function of WGAN is

min

max

w∈W

x∼p

data

(x)

(x)] − E

z∼p

(z)

(G (z))]

= min

max

x∼p

data

(x)

[D (x)] − E

z∼p

(z)

[D (G (z))] .

(30)

剩余27页未读，继续阅读

qq_36225236

粉丝: 0
资源: 23

2019-2020 GAN算法与应用深度解析：生成对抗网络综述

计算机视觉中的生成对抗网络：综述与分类

深度学习中的GAN逆向技术综述

基于GAN的快速磁共振成像技术综述

深度学习生成式对抗网络综述.pdf

基于深度学习的文本自动生成技术研究综述.pdf

基于GAN图像生成的信息隐藏技术综述.pdf

基于深度学习的数据生成模型综述.pdf

递归神经网络研究综述.pdf

深度学习对抗样本的防御方法综述.pdf

基于深度学习的网络入侵异常检测综述.pdf

最新资源