变分自编码器详解：高效推断与学习

下载需积分: 50 | PDF格式 | 3.74MB | 更新于2024-08-04 | 47 浏览量 | 举报

"这篇PDF文档详细介绍了变分自编码器（VAE）的原理和结构，旨在帮助初学者深入理解这一深度学习模型。作者Diederik P. Kingma和Max Welling是阿姆斯特丹大学机器学习组的研究员，他们提出了一种用于处理具有连续隐藏变量的不可解后验分布的高效推断和学习算法。" 在深度学习领域，变分自编码器（Variational Autoencoder, VAE）是一种结合了自编码器和贝叶斯推断的神经网络模型，用于学习数据的潜在表示或编码。它在无监督学习任务中表现出色，如图像生成、文本生成等。VAE的核心思想是通过变分推理来近似复杂的数据后验分布。 VAE的模型结构通常包括两部分：编码器和解码器。编码器将输入数据映射到一个潜在空间的随机向量，而解码器则尝试根据这个向量重构原始输入。在这个过程中，VAE引入了一个变分下界（ELBO，Evidence Lower Bound），它是一个可以优化的损失函数，用于最小化重构误差和潜在变量与数据之间的分布差异。文档中的第一项贡献是关于变分下界的重参数化技巧。这允许我们将不可导的变分下界转化为可以通过标准随机梯度方法优化的形式。这种技巧的关键在于通过引入一个中间变量，使得原本依赖于隐变量的操作变为仅依赖于可微的参数，从而可以进行有效的梯度计算。其次，对于独立同分布（i.i.d.）的大数据集，每个数据点都有连续的隐藏变量，VAE提出了一种方法，即通过拟合一个近似推理模型（也称为识别模型）来高效地估计不可解的后验分布。这种方法提高了推断的效率，并且能够学习到更具有解释性的潜在表示。理论上的优势体现在，VAE不仅能够捕获数据的复杂性，还提供了一种有效的方法来处理连续隐藏变量的不确定性。通过学习一个概率分布而不是单个固定编码，VAE能够生成新的样本，这是传统的自编码器无法做到的。此外，由于其贝叶斯性质，VAE可以处理模型的不确定性和复杂性，使其在许多实际应用中具有广泛的价值。

展开

For the purpose of solving the above problems, let us introduce a recognition model q

(z|x): an

approximation to the intractable true posterior p

(z|x). Note that in contrast with the approximate

posterior in mean-ﬁeld variational inference, it is not necessarily factorial and its parameters φ are

not computed from some closed-form expectation. Instead, we’ll introduce a method for learning

the recognition model parameters φ jointly with the generative model parameters θ.

From a coding theory perspective, the unobserved variables z have an interpretation as a latent

representation or code. In this paper we will therefore also refer to the recognition model q

(z|x)

as a probabilistic encoder, since given a datapoint x it produces a distribution (e.g. a Gaussian)

over the possible values of the code z from which the datapoint x could have been generated. In a

similar vein we will refer to p

(x|z) as a probabilistic decoder, since given a code z it produces a

distribution over the possible corresponding values of x.

2.2 The variational bound

The marginal likelihood is composed of a sum over the marginal likelihoods of individual datapoints

log p

(1)

, · · · , x

(N)

) =

i=1

log p

(i)

), which can each be rewritten as:

log p

(i)

) = D

(z|x

(i)

)||p

(z|x

(i)

)) + L(θ, φ; x

(i)

) (1)

The ﬁrst RHS term is the KL divergence of the approximate from the true posterior. Since this

KL-divergence is non-negative, the second RHS term L(θ, φ; x

(i)

) is called the (variational) lower

bound on the marginal likelihood of datapoint i, and can be written as:

log p

(i)

) ≥ L(θ, φ; x

(i)

) = E

(z|x)

[− log q

(z|x) + log p

(x, z)] (2)

which can also be written as:

L(θ, φ; x

(i)

) = −D

(z|x

(i)

)||p

(z)) + E

(z|x

(i)

)

log p

(i)

|z)

(3)

We want to differentiate and optimize the lower bound L(θ, φ; x

(i)

) w.r.t. both the variational

parameters φ and generative parameters θ. However, the gradient of the lower bound w.r.t. φ

is a bit problematic. The usual (na

ıve) Monte Carlo gradient estimator for this type of problem

is: ∇

(z)

[f(z)] = E

(z)



f(z)∇

(z)

log q

(z)



l=1

f(z)∇

(l)

)

log q

(l)

) where

(l)

∼ q

(z|x

(i)

). This gradient estimator exhibits exhibits very high variance (see e.g. [BJP12])

and is impractical for our purposes.

2.3 The SGVB estimator and AEVB algorithm

In this section we introduce a practical estimator of the lower bound and its derivatives w.r.t. the

parameters. We assume an approximate posterior in the form q

(z|x), but please note that the

technique can be applied to the case q

(z), i.e. where we do not condition on x, as well. The fully

variational Bayesian method for inferring a posterior over the parameters is given in the appendix.

Under certain mild conditions outlined in section 2.4 for a chosen approximate posterior q

(z|x) we

can reparameterize the random variable

z ∼ q

(z|x) using a differentiable transformation g

(, x)

of an (auxiliary) noise variable :

z = g

(, x) with  ∼ p() (4)

See section 2.4 for general strategies for chosing such an approriate distribution p() and function

(, x). We can now form Monte Carlo estimates of expectations of some function f(z) w.r.t.

(z|x) as follows:

(z|x

(i)

)

[f(z)] = E

p()

f(g

(, x

(i)

))

l=1

f(g

(

(l)

, x

(i)

)) where 

(l)

∼ p() (5)

We apply this technique to the variational lower bound (eq. (2)), yielding our generic Stochastic

Gradient Variational Bayes (SGVB) estimator

(θ, φ; x

(i)

) ' L(θ, φ; x

(i)

(θ, φ; x

(i)

) =

l=1

log p

(i)

, z

(i,l)

) − log q

(i,l)

(i)

)

where z

(i,l)

= g

(

(i,l)

, x

(i)

) and 

(l)

∼ p() (6)

下载后可阅读完整内容，剩余13页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

zs11p

粉丝: 0

变分自编码器详解：高效推断与学习

Auto-Encoding Variational Bayes

VAE推导过程.docx

Tutorial on Variational Autoencoders

vae原文2

变体自动编码器：如“自动编码变体贝叶斯” [Kingma，Welling，2014]中所述，在PyTorch中实现简单VAE的个人实现。

变分自动编码器（VAE）

嵌入式八股文面试题库资料知识宝典-华为的面试试题.zip

训练导控系统设计.pdf

嵌入式八股文面试题库资料知识宝典-网络编程.zip

人脸转正GAN模型的高效压缩.pdf

最新资源