无分类器扩散引导：提升样本质量和多样性

需积分: 1 54 浏览量更新于2024-08-04 收藏 3.6MB PDF 举报

"Classifier-Free Diffusion Guidance是一种在训练后调整条件扩散模型的模式覆盖率和样本保真度的方法，类似于其他生成模型中的低温采样或截断。它结合了扩散模型的得分估计和图像分类器的梯度，需要单独训练一个图像分类器。然而，该论文提出了一种无分类器指导（Classifier-Free Guidance）的方法，即联合训练条件性和无条件性扩散模型，通过组合它们的结果条件和无条件得分估计，实现与使用分类器指导相类似的样本质量和多样性平衡。" 本文探讨的是扩散模型在人工智能领域的应用，特别是聚焦于如何在生成模型中优化样本的质量和多样性。扩散模型是一种强大的生成模型，近期在生成任务中表现出色，因其灵活性和表达能力而受到关注。通常，生成模型的目标是在生成新样本时平衡模式覆盖率（多样性的样本）和样本保真度（样本的真实性和质量）。分类器指导是一种策略，通过将扩散模型的得分与图像分类器的梯度相结合，以调整这个平衡。传统的分类器指导方法需要额外训练一个图像分类器，这增加了计算复杂性和资源需求。为了解决这个问题，作者提出了无分类器指导的概念。在这个创新的方法中，他们联合训练条件扩散模型（取决于输入条件生成样本）和无条件扩散模型。通过结合这两种模型的得分估计，可以在不依赖额外分类器的情况下达到类似的效果，即在保持样本高质量的同时提高样本的多样性。这种技术对于AI和ChatGPT等自然语言处理应用尤其有价值，因为它可以生成更真实、更多样化的文本或图像，而无需在模型训练过程中引入额外的监督。在ChatGPT的场景中，这意味着能够生成更加自然、多样且符合语境的对话响应，提升用户体验。 Classifier-Free Diffusion Guidance提供了一种新的途径来优化条件扩散模型的性能，允许在没有外部分类器的情况下调整生成样本的质量和多样性。这种方法对人工智能研究和开发具有深远的影响，因为它降低了生成模型的复杂性，同时可能提高生成内容的质量和实用性。

During sampling, we apply this transition along an increasing sequence

min

= λ

< · · · < λ

max

for

timesteps; in other words, we follow the discrete time ancestral sampler of Sohl-Dickstein

et al. (2015); Ho et al. (2020). If the model

is correct, then as

T → ∞

, we obtain samples from an

SDE whose sample paths are distributed as

p(z)

(Song et al., 2021b), and we use

(z)

to denote the

continuous time model distribution. The variance is a log-space interpolation of

˜σ

|λ

and

λ|λ

suggested by Nichol & Dhariwal (2021); we found it effective to use a constant hyperparameter

rather than learned

-dependent

. Note that the variances simplify to

˜σ

|λ

→ λ

, so

has an

effect only when sampling with non-inﬁnitesimal timesteps as done in practice.

The reverse process mean comes from an estimate

) ≈ x

plugged into

q(z

, x)

(Ho et al.,

2020; Kingma et al., 2021) (

also receives

as input, but we suppress this to keep our notation

clean). We parameterize

in terms of



-prediction (Ho et al., 2020):

) = (z

−σ



))/α

and we train on the objective

,λ



k

) − k



(5)

where

 ∼ N (0, I)

= α

x + σ



, and

is drawn from a distribution

p(λ)

over

[λ

min

, λ

max

]

This objective is denoising score matching (Vincent, 2011; Hyv

arinen & Dayan, 2005) over multiple

noise scales (Song & Ermon, 2019), and when

p(λ)

is uniform, the objective is proportional to the

variational lower bound on the marginal log likelihood of the latent variable model

(x|z)p

(z)dz

ignoring the term for the unspeciﬁed decoder

(x|z)

and for the prior at

min

(Kingma et al., 2021).

p(λ)

is not uniform, the objective can be interpreted as weighted variational lower bound whose

weighting can be tuned for sample quality (Ho et al., 2020; Kingma et al., 2021). We use a

p(λ)

inspired by the discrete time cosine noise schedule of Nichol & Dhariwal (2021): we sample

via

λ = −2 log tan(au + b)

for uniformly distributed

u ∈ [0, 1]

, where

b = arctan(e

−λ

max

)

and

a = arctan(e

−λ

min

) − b

. This represents a hyperbolic secant distribution modiﬁed to be supported

on a bounded interval. For ﬁnite timestep generation, we use

values corresponding to uniformly

spaced u ∈ [0, 1], and the ﬁnal generated sample is x

max

Because the loss for



)

is denoising score matching for all

, the score



)

learned by our

model estimates the gradient of the log-density of the distribution of our noisy data

, that is



) ≈ −σ

∇

log p(z

)

; note, however, that because we use unconstrained neural networks to

deﬁne



, there need not exist any scalar potential whose gradient is



. Sampling from the learned

diffusion model resembles using Langevin diffusion to sample from a sequence of distributions

p(z

)

that converges to the conditional distribution p(x) of the original data x.

In the case of conditional generative modeling, the data

is drawn jointly with conditioning informa-

tion

, i.e. a class label for class-conditional image generation. The only modiﬁcation to the model is

that the reverse process function approximator receives c as input, as in 

, c).

3 GUIDANCE

An interesting property of certain generative models, such as GANs and ﬂow-based models, is the

ability to perform truncated or low temperature sampling by decreasing the variance or range of noise

inputs to the generative model at sampling time. The intended effect is to decrease the diversity of

the samples while increasing the quality of each individual sample. Truncation in BigGAN (Brock

et al., 2019), for example, yields a tradeoff curve between FID score and Inception score for low and

high amounts of truncation, respectively. Low temperature sampling in Glow (Kingma & Dhariwal,

2018) has a similar effect.

Unfortunately, straightforward attempts of implementing truncation or low temperature sampling

in diffusion models are ineffective. For example, scaling model scores or decreasing the variance

of Gaussian noise in the reverse process cause the diffusion model to generate blurry, low quality

samples (Dhariwal & Nichol, 2021).

3.1 CLASSIFIER GUIDANCE

To obtain a truncation-like effect in diffusion models, Dhariwal & Nichol (2021) introduce classiﬁer

guidance, where the diffusion score



, c) ≈ −σ

∇

log p(z

|c)

is modiﬁed to include the

剩余13页未读，继续阅读

IT徐师兄

粉丝: 2139
资源: 2689

无分类器扩散引导：提升样本质量和多样性

PyPI 官网下载 | inspire-classifier-0.1.3.tar.gz

Python库 | rooted-tree-classifier-0.1.3.tar.gz

classifier-free diffusion guidance

Python库 | classifier-1.6.4.tar.gz

Python库 | golicense_classifier-0.0.10.tar.gz

Traffic-Sign-Classifier-Project-master.zip_The Test_deep learnin

Cat-Dog-CNN-Classifier-找到宝贝了.zip

CarND-Traffic-Sign-Classifier-Project-master.zip_CNN_CNN预测_traff

CarND-Traffic-Sign-Classifier-Project-master.zip_traffic_交通信号标志识

Python库 | ocr4all_pixel_classifier-0.6.4.tar.gz

最新资源