改进的WGAN：移除权重剪辑的稳定性训练策略

需积分: 46 124 浏览量更新于2024-07-14 收藏 5.8MB PDF 举报

WGAN with Gradient Penalty（权重梯度惩罚的 Wasserstein GAN）是深度学习领域的一个重要进展，特别是在生成对抗网络（GANs）的研究中。传统GANs虽然强大，但在训练过程中常常遭遇稳定性问题，如模式崩溃、样本质量低下或无法收敛。Wasserstein GAN（WGAN）通过引入地球移动距离（Wasserstein距离）作为损失函数，试图解决这些问题，但其关键组件之一是权重裁剪，目的是强制让判别器（critic）保持Lipschitz连续性，以避免梯度消失或爆炸。然而，WGAN中的权重裁剪并非无懈可击。研究者Ishaan Gulrajani等人发现，过度依赖于权重裁剪可能导致不必要的行为，并且在实际应用中可能限制模型的性能。他们提出了一个替代方案：在训练过程中对判别器的输入梯度进行范数惩罚，即在损失函数中添加一个额外项来约束梯度的大小，从而避免了对权重进行硬性的裁剪。这项新方法的优势在于提高了WGAN的训练稳定性，使得包括101层残差网络（ResNets）在内的复杂网络结构和具有连续生成器的语言模型都能实现稳定的训练，而无需大量的超参数调优。这意味着使用WGAN with Gradient Penalty可以更广泛地应用于各种复杂的生成任务，提升了模型的生成质量和性能。总结来说，WGAN with Gradient Penalty提供了一种更为有效的处理GAN训练不稳定性的策略，它通过灵活控制梯度的大小，减少了对权重剪切的依赖，使得训练过程更加可控。这一改进对于推动GAN技术在实际场景中的广泛应用具有重要意义，尤其是在需要生成高质量样本的高维数据集上。

Algorithm 1 WGAN with gradient penalty. We use default values of λ = 10, n

critic

= 5, α =

0.0001, β

= 0, β

= 0.9.

Require: The gradient penalty coefﬁcient λ, the number of critic iterations per generator iteration

critic

, the batch size m, Adam hyperparameters α, β

, β

Require: initial critic parameters w

, initial generator parameters θ

1: while θ has not converged do

2: for t = 1, ..., n

critic

3: for i = 1, ..., m do

4: Sample real data x ∼ P

, latent variable z ∼ p(z), a random number  ∼ U [0, 1].

x ← G

(z)

x ← x + (1 − )

7: L

(i)

← D

(

x) − D

(x) + λ(k∇

(

x)k

− 1)

8: end for

9: w ← Adam(∇

i=1

(i)

, w, α, β

, β

)

10: end for

11: Sample a batch of latent variables {z

(i)

}

i=1

∼ p(z).

12: θ ← Adam(∇

i=1

−D

(z)), θ, α, β

, β

)

13: end while

critic. In each case, the critic trained with weight clipping ignores higher moments of the data dis-

tribution and instead models very simple approximations to the optimal functions. In contrast, our

approach does not suffer from this behavior.

3.2 Exploding and vanishing gradients

We observe that the WGAN optimization process is difﬁcult because of interactions between the

weight constraint and the cost function, which result in either vanishing or exploding gradients

without careful tuning of the clipping threshold c.

To demonstrate this, we train WGAN on the Swiss Roll toy dataset, varying the clipping threshold c

in [10

−1

, 10

−2

, 10

−3

], and plot the norm of the gradient of the critic loss with respect to successive

layers of activations. Both generator and critic are 12-layer ReLU MLPs without batch normaliza-

tion. Figure 1b shows that for each of these values, the gradient either grows or decays exponentially

as we move farther back in the network. We ﬁnd our method results in more stable gradients that

neither vanish nor explode, allowing training of more complicated networks.

4 Gradient penalty

We now propose an alternative way to enforce the Lipschitz constraint. A differentiable function

is 1-Lipschtiz if and only if it has gradients with norm at most 1 everywhere, so we consider di-

rectly constraining the gradient norm of the critic’s output with respect to its input. To circumvent

tractability issues, we enforce a soft version of the constraint with a penalty on the gradient norm

for random samples

x ∼ P

. Our new objective is

L = E

x∼P

[D(

x)] − E

x∼P

[D(x)]

| {z }

Original critic loss

+ λ E

x∼P



(k∇

x)k

− 1)



| {z }

Our gradient penalty

(3)

Sampling distribution We implicitly deﬁne P

sampling uniformly along straight lines between

pairs of points sampled from the data distribution P

and the generator distribution P

. This is

motivated by the fact that the optimal critic contains straight lines with gradient norm 1 connecting

coupled points from P

and P

(see Proposition 1). Given that enforcing the unit gradient norm

constraint everywhere is intractable, enforcing it only along these straight lines seems sufﬁcient and

experimentally results in good performance.

Penalty coefﬁcient All experiments in this paper use λ = 10, which we found to work well across

a variety of architectures and datasets ranging from toy tasks to large ImageNet CNNs.

剩余19页未读，继续阅读

原来zz

粉丝: 21
资源: 3

改进的WGAN：移除权重剪辑的稳定性训练策略

【ch13-生成对抗网络】 GAN实战.pdf

Python-MLSS2018马德里关于生成对抗网络GAN的讲座材料

(WGAN、WGAN_gp)Wasseratein GAN

GAIL-with-WGAN-loss-for-the-Discriminator:这是关于使用PPO和WGAN-GP丢失的模仿学习。 以下链接-https中的GAIL-PPO存储库严重影响了此操作

GAN、WGAN、WGAN-GP5.docx

WGAN-MNIST-master

WGAN-GP.rar

gradient_penalty = ((gradients.norm(2, dim=1) - 1) ** 2).mean() * self.LAMBDA什么意思

improved wgan

WGAN-pytorch

最新资源

GAIL-with-WGAN-loss-for-the-Discriminator:这是关于使用PPO和WGAN-GP丢失的模仿学习。以下链接-https中的GAIL-PPO存储库严重影响了此操作