没有合适的资源?快使用搜索试试~ 我知道了~
首页优化Wasserstein GAN训练:梯度惩罚提升模型稳定性与生成质量
标题:"改进训练的Wasserstein GANs"(Improved Training of Wasserstein GANs) 描述:该论文探讨了生成性对抗网络(GANs)在深度学习中的广泛应用,尽管它们展示了强大的生成能力,但在训练稳定性上却面临着挑战。传统的Wasserstein GAN (WGAN)方法旨在解决GAN训练不稳定的问题,然而,它仍然存在生成低质量样本或无法收敛的问题。研究者发现,WGAN中使用权重剪辑(weight clipping)来确保批评家(critic)的Lipschitz连续性可能是导致这些问题的原因。 作者们观察到,这种强制性的权重约束可能导致批评家行为异常,因此他们提出了一种替代方案——通过惩罚批评家对其输入的梯度范数(gradient norm penalty)来控制权重。这种方法避免了直接剪切权重,使得WGAN的训练更为稳健。实验结果表明,新提出的惩罚机制显著提高了WGAN的性能,使其能够在广泛的架构下进行稳定训练,包括具有101层的ResNet网络和具有连续生成器的语言模型。 论文亮点在于,在几乎无需超参数调整的情况下,改进后的WGAN能够在CIFAR-10和LSUN卧室等数据集上实现高质量的图像生成。这意味着研究人员已经找到了一种更有效且更具普适性的训练策略,这对于推动GAN技术的实际应用和发展具有重要意义。 这篇论文的核心贡献在于提出了一个新的训练技巧,解决了Wasserstein GAN在高维空间中训练时的挑战,为稳定、高效地生成高质量样本提供了一个新的解决方案。这将有助于进一步提升生成模型的实用性和可靠性,对于人工智能领域的研究者和实践者来说,具有很高的价值。
资源详情
资源推荐
Algorithm 1 WGAN with gradient penalty. We use default values of λ = 10, n
critic
= 5, α =
0.0001, β
1
= 0, β
2
= 0.9.
Require: The gradient penalty coefficient λ, the number of critic iterations per generator iteration
n
critic
, the batch size m, Adam hyperparameters α, β
1
, β
2
.
Require: initial critic parameters w
0
, initial generator parameters θ
0
.
1: while θ has not converged do
2: for t = 1, ..., n
critic
do
3: for i = 1, ..., m do
4: Sample real data x ∼ P
r
, latent variable z ∼ p(z), a random number ∼ U [0, 1].
5:
˜
x ← G
θ
(z)
6:
ˆ
x ← x + (1 − )
˜
x
7: L
(i)
← D
w
(
˜
x) − D
w
(x) + λ(k∇
ˆ
x
D
w
(
ˆ
x)k
2
− 1)
2
8: end for
9: w ← Adam(∇
w
1
m
P
m
i=1
L
(i)
, w, α, β
1
, β
2
)
10: end for
11: Sample a batch of latent variables {z
(i)
}
m
i=1
∼ p(z).
12: θ ← Adam(∇
θ
1
m
P
m
i=1
−D
w
(G
θ
(z)), θ, α, β
1
, β
2
)
13: end while
critic. In each case, the critic trained with weight clipping ignores higher moments of the data dis-
tribution and instead models very simple approximations to the optimal functions. In contrast, our
approach does not suffer from this behavior.
3.2 Exploding and vanishing gradients
We observe that the WGAN optimization process is difficult because of interactions between the
weight constraint and the cost function, which result in either vanishing or exploding gradients
without careful tuning of the clipping threshold c.
To demonstrate this, we train WGAN on the Swiss Roll toy dataset, varying the clipping threshold c
in [10
−1
, 10
−2
, 10
−3
], and plot the norm of the gradient of the critic loss with respect to successive
layers of activations. Both generator and critic are 12-layer ReLU MLPs without batch normaliza-
tion. Figure 1b shows that for each of these values, the gradient either grows or decays exponentially
as we move farther back in the network. We find our method results in more stable gradients that
neither vanish nor explode, allowing training of more complicated networks.
4 Gradient penalty
We now propose an alternative way to enforce the Lipschitz constraint. A differentiable function
is 1-Lipschtiz if and only if it has gradients with norm at most 1 everywhere, so we consider di-
rectly constraining the gradient norm of the critic’s output with respect to its input. To circumvent
tractability issues, we enforce a soft version of the constraint with a penalty on the gradient norm
for random samples
ˆ
x ∼ P
ˆ
x
. Our new objective is
L = E
˜
x∼P
g
[D(
˜
x)] − E
x∼P
r
[D(x)]
| {z }
Original critic loss
+ λ E
ˆ
x∼P
ˆ
x
(k∇
ˆ
x
D(
ˆ
x)k
2
− 1)
2
.
| {z }
Our gradient penalty
(3)
Sampling distribution We implicitly define P
ˆ
x
sampling uniformly along straight lines between
pairs of points sampled from the data distribution P
r
and the generator distribution P
g
. This is
motivated by the fact that the optimal critic contains straight lines with gradient norm 1 connecting
coupled points from P
r
and P
g
(see Proposition 1). Given that enforcing the unit gradient norm
constraint everywhere is intractable, enforcing it only along these straight lines seems sufficient and
experimentally results in good performance.
Penalty coefficient All experiments in this paper use λ = 10, which we found to work well across
a variety of architectures and datasets ranging from toy tasks to large ImageNet CNNs.
4
剩余19页未读,继续阅读
电动汽车控制与安全
- 粉丝: 263
- 资源: 4186
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 计算机人脸表情动画技术发展综述
- 关系数据库的关键字搜索技术综述:模型、架构与未来趋势
- 迭代自适应逆滤波在语音情感识别中的应用
- 概念知识树在旅游领域智能分析中的应用
- 构建is-a层次与OWL本体集成:理论与算法
- 基于语义元的相似度计算方法研究:改进与有效性验证
- 网格梯度多密度聚类算法:去噪与高效聚类
- 网格服务工作流动态调度算法PGSWA研究
- 突发事件连锁反应网络模型与应急预警分析
- BA网络上的病毒营销与网站推广仿真研究
- 离散HSMM故障预测模型:有效提升系统状态预测
- 煤矿安全评价:信息融合与可拓理论的应用
- 多维度Petri网工作流模型MD_WFN:统一建模与应用研究
- 面向过程追踪的知识安全描述方法
- 基于收益的软件过程资源调度优化策略
- 多核环境下基于数据流Java的Web服务器优化实现提升性能
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功