两时标更新规则推动GAN训练收敛：实证到局部纳什均衡

需积分: 0 187 浏览量更新于2024-06-30 收藏 6.02MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源推荐

considerably reduced the cases for which we observed mode collapsing. Although TTUR ensures

that the discriminator converges during learning, practicable learning rates must be found for each

experiment. We face a trade-off since the learning rates should be small enough (e.g. for the generator)

to ensure convergence but at the same time should be large enough to allow fast learning. For each of

the experiments, the learning rates have been optimized to be large while still ensuring stable training

which is indicated by a decreasing FID or Jensen-Shannon-divergence (JSD). We further ﬁxed the

time point for stopping training to the update step when the FID or Jensen-Shannon-divergence of

the best models was no longer decreasing. For some models, we observed that the FID diverges

or starts to increase at a certain time point. An example of this behaviour is shown in Fig. 5. The

performance of generative models is evaluated via the Fréchet Inception Distance (FID) introduced

above. For the One Billion Word experiment, the normalized JSD served as performance measure.

For computing the FID, we propagated all images from the training dataset through the pretrained

Inception-v3 model following the computation of the Inception Score [

], however, we use the last

pooling layer as coding layer. For this coding layer, we calculated the mean

and the covariance

matrix

. Thus, we approximate the ﬁrst and second central moment of the function given by

the Inception coding layer under the real world distribution. To approximate these moments for the

model distribution, we generate 50,000 images, propagate them through the Inception-v3 model, and

then compute the mean

and the covariance matrix

. For computational efﬁciency, we evaluate

the FID every 1,000 DCGAN mini-batch updates, every 5,000 WGAN-GP outer iterations for the

image experiments, and every 100 outer iterations for the WGAN-GP language model. For the one

time-scale updates a WGAN-GP outer iteration for the image model consists of ﬁve discriminator

mini-batches and ten discriminator mini-batches for the language model, where we follow the original

implementation. For TTUR however, the discriminator is updated only once per iteration. We repeat

the training for each single time-scale (orig) and TTUR learning rate eight times for the image

datasets and ten times for the language benchmark. Additionally to the mean FID training progress

we show the minimum and maximum FID over all runs at each evaluation time-step. For more details,

implementations and further results see Appendix Section A4 and A6.

Simple Toy Data.

We ﬁrst want to demonstrate the difference between a single time-scale update

rule and TTUR on a simple toy min/max problem where a saddle point should be found. The

objective

f(x, y) = (1 + x

)(100 − y

)

in Fig. 4 (left) has a saddle point at

(x, y) = (0, 0)

and

fulﬁlls assumption A4. The norm

k(x, y)k

measures the distance of the parameter vector

(x, y)

the saddle point. We update

(x, y)

by gradient descent in

and gradient ascent in

using additive

Gaussian noise in order to simulate a stochastic update. The updates should converge to the saddle

point

(x, y) = (0, 0)

with objective value

f(0, 0) = 100

and the norm

. In Fig. 4 (right), the ﬁrst

two rows show one time-scale update rules. The large learning rate in the ﬁrst row diverges and has

large ﬂuctuations. The smaller learning rate in the second row converges but slower than the TTUR in

the third row which has slow

-updates. TTUR with slow

-updates in the fourth row also converges

but slower.

8000

5750

3500

1250

1000

150

200

objective

0.5

1.0

norm

0.25

0.00

x vs y

100

110

0.0

0.5

0.25

0.00

100

125

0.0

0.5

0.25

0.00

0 2000 4000

100

125

0 2000 4000

0.25

0.50

0.5 0.0 0.5

0.4

0.2

Figure 4:

Left:

Plot of the objective with a saddle point at

(0, 0)

Right:

Training progress with

equal learning rates of

0.01

(ﬁrst row) and

0.001

(second row)) for

and

, TTUR with a learning

rate of

0.0001

for

vs.

0.01

for

(third row) and a larger learning rate of

0.01

for

vs.

0.0001

for

(fourth row). The columns show the function values (left), norms (middle), and

(x, y)

(right). TTUR

(third row) clearly converges faster than with equal time-scale updates and directly moves to the

saddle point as shown by the norm and in the (x, y)-plot.

DCGAN on Image Data.

We test TTUR for the deep convolutional GAN (DCGAN) [

] at the

CelebA, CIFAR-10, SVHN and LSUN Bedrooms dataset. Fig. 5 shows the FID during learning

0 50 100 150 200 250

mini-batch x 1k

200

400

FID

orig 1e-5

orig 1e-4

orig 5e-4

TTUR 1e-5 5e-4

20 40 60 80 100 120

mini-batch x 1k

100

120

FID

orig 1e-4

orig 2e-4

orig 5e-4

TTUR 1e-4 5e-4

0 25 50 75 100 125 150 175

mini-batch x 1k

200

400

FID

orig 1e-5

orig 5e-5

orig 1e-4

TTUR 1e-5 1e-4

0 50 100 150 200 250 300 350 400

mini-batch x 1k

200

400

FID

orig 1e-5

orig 5e-5

orig 1e-4

TTUR 1e-5 1e-4

Figure 5: Mean FID (solid line) surrounded by a shaded area bounded by the maximum and the

minimum over 8 runs for DCGAN on CelebA, CIFAR-10, SVHN, and LSUN Bedrooms. TTUR

learning rates are given for the discriminator

and generator

as: “TTUR

b a

”.

Top Left:

CelebA.

Top Right:

CIFAR-10, starting at mini-batch update 10k for better visualisation.

Bottom Left:

SVHN.

Bottom Right:

LSUN Bedrooms. Training with TTUR (red) is more stable, has much lower

variance, and leads to a better FID.

with the original learning method (orig) and with TTUR. The original training method is faster at

the beginning, but TTUR eventually achieves better performance. DCGAN trained TTUR reaches

constantly a lower FID than the original method and for CelebA and LSUN Bedrooms all one

time-scale runs diverge. For DCGAN the learning rate of the generator is larger then that of the

discriminator, which, however, does not contradict the TTUR theory (see the Appendix Section A5).

In Table 1 we report the best FID with TTUR and one time-scale training for optimized number of

updates and learning rates. TTUR constantly outperforms standard training and is more stable.

WGAN-GP on Image Data.

We used the WGAN-GP image model [

] to test TTUR with the

CIFAR-10 and LSUN Bedrooms datasets. In contrast to the original code where the discriminator is

trained ﬁve times for each generator update, TTUR updates the discriminator only once, therefore

we align the training progress with wall-clock time. The learning rate for the original training was

optimized to be large but leads to stable learning. TTUR can use a higher learning rate for the

discriminator since TTUR stabilizes learning. Fig. 6 shows the FID during learning with the original

learning method and with TTUR. Table 1 shows the best FID with TTUR and one time-scale training

for optimized number of iterations and learning rates. Again TTUR reaches lower FIDs than one

time-scale training.

0 200 400 600 800 1000

minutes

100

150

FID

orig 1e-4

orig 5e-4

orig 7e-4

TTUR 3e-4 1e-4

0 500 1000 1500 2000

minutes

100

200

300

400

FID

orig 1e-4

orig 5e-4

orig 7e-4

TTUR 3e-4 1e-4

Figure 6: Mean FID (solid line) surrounded by a shaded area bounded by the maximum and the

minimum over 8 runs for WGAN-GP on CelebA, CIFAR-10, SVHN, and LSUN Bedrooms. TTUR

learning rates are given for the discriminator

and generator

as: “TTUR

b a

”.

Left:

CIFAR-10,

starting at minute 20.

Right:

LSUN Bedrooms. Training with TTUR (red) has much lower variance

and leads to a better FID.

剩余37页未读，继续阅读

不能汉字字母b

粉丝: 16
资源: 291

会员权益专享

两时标更新规则推动GAN训练收敛：实证到局部纳什均衡

stylegan2-ffhq-config-f.pkl

stylegan2-master.zip

GAN-Overview-Chinese.pdf

“StyleGAN-V：连续时间视频生成器”

GAN 与强化学习的结合：GAN-RL 探秘

GANgealing: 通过GAN-Supervised Learning实现密集视觉对齐

VAEGAN-D2: 任意镜头学习的特征生成框架

RL-GAN-Net：基于强化学习Agent控制的GAN网络实现点云形状实时补全

gan运行提示Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz

deep-learning-with-pytorch.pdf 15章

写一段用R语言实现GAN的代码

使用MindStudio搭建SSD网络，并基于昇腾CANN的卡通图像生成网络；具体内容参照《物体识别实验手册》与《基于CANN的卡通图像生成实验手册》。

stylegan2-ada

stylegan2-ada-python训练

如何下载和安装StyleGAN2-ADA和CLIP

l_d_real = self.cri_gan(pred_d_real - torch.mean(pred_d_fake), True) 解释该段代码

https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/metrics/inception-2015-12-05.pkl 这是模型下载网址吗

会员权益专享

最新资源