双向边界优化：全方位训练能量模型

版权申诉

189 浏览量更新于2024-07-06 收藏 2.35MB PDF 举报

"具有双向边界的全方位训练能量模型_Bounds all around training energy-based models with bidirectional bounds" 这篇论文探讨了能量基础模型（Energy-Based Models, EBM）的训练问题，这是一种在密度估计中广泛应用的优雅框架。然而，EBM的训练过程通常充满挑战。近年来的研究已经将EBM与生成对抗网络（Generative Adversarial Networks, GANs）建立了联系，通过最小化最大游戏（minimax game）来训练EBM，其中使用了变分价值函数。作者 Cong Geng, Jia Wang, Zhiyong Gao 来自上海交通大学，而 Jes Frellsen 和 Søren Hauberg 来自丹麦技术大学，他们提出了一个双向边界的概念，这可以应用于EBM的对数似然性上。具体来说，他们在解决最小化最大游戏时，最大化一个下界并最小化一个上界。这种双向边界方法有助于稳定训练过程。其中一个边界被关联到梯度惩罚（gradient penalty），这个惩罚项能稳定训练，从而为最佳工程实践提供了理论依据。为了评估这些边界，他们开发了一种新的、有效的EBM生成器雅可比行列式的估计器。这种方法的创新之处在于显著地提高了训练的稳定性，并且能够生成高质量的密度估计和样本。 EBM的基本思想是模型的联合分布可以通过一个非负能量函数来表示，其中概率与能量的负指数成比例。通过优化能量函数，EBM可以学习数据的分布。然而，直接优化EBM的对数似然通常困难，因为涉及到计算归一化常数，也就是所谓的“Z”问题。双向边界提供了一种新的优化策略，解决了这个问题。论文中提到的新估计器对于雅可比行列式计算的改进是关键，因为在变分推断和梯度估计中，这个值的准确计算至关重要。高效的雅可比行列式估计使得EBM的训练更加有效，并且能够生成更高质量的样本，这对于EBM在图像生成、自然语言处理和其他领域中的应用具有重要意义。这篇论文为EBM的训练带来了新的视角和方法，通过双向边界和梯度惩罚的结合，不仅提高了训练稳定性，还提升了模型的性能。这一进展对于深度学习和人工智能领域的研究者来说是重要的参考，尤其是那些致力于改进模型训练效率和结果质量的人。

Bound tightness

When

= p

, we have that

|∇

(x) + ∇

log p

(x)|

= 0

. In Theo-

rem 1,

is a constant related to the Lipschitz constant of

log f(x)

satisfying

|∇

log f(x)|

≤

|∇

log f(y)|

for all

x, y

(see proof for details). When

= p

we also have

|∇

log f(x)|

0, such that m = 0. Our upper bound is then dL(θ)e = L(θ) = L(θ), and hence it is tight.

3 Numerical evaluation of the bounds

Evaluating the lower bound To evaluate the lower bound in Equation (11), we need the smallest

singular value of the Jacobian

J = ∂

G(z)

. Recall that this singular value satisfy

= kJv

min

v6=0

kJvk

kvk

. We can then evaluate the singular value by ﬁnding

min

with an iterative optimization

algorithm, where we opt to use the celebrated single-vector LOBPCG algorithm (Knyazev, 1998).

This method performs an iterative minimization of the generalized Rayleigh quotient,

ρ(v) :=

, (16)

which converges to

min

. The gradient of

ρ(v)

is proportional to

r = J

Jv − ρ(v)v

. To avoid

computing the Jacobian

, we use Jacobian-vector products, which can be efﬁciently evaluated using

automatic differentiation. To compute J

Jv, we use the following trick (in pytorch-notation):

Jv = ((Jv)

= ∇

((Jv)

.detach() · G(z))

. (17)

The optimal learning rate for this iterative scheme can be found by maximizing the Rayleigh

quotient

(16)

. Finally, we follow the suggestions of Knyazev (2001) to improve numerical stability

and accelerate convergence, which we omit here for brevity.

Evaluating the upper bound

There are two challenges when evaluating Equation (15). The

ﬁrst is to compute

∇

log p

(x)

, where we empirically found that existing methods (Shi

et al., 2018; Li and Turner, 2018) were too inefﬁcient for our needs. To evaluate the term

x∼p

(x)

[|∇

(x) + ∇

log p

(x)|

], we further loosen the bound

|∇

G(z)

(G(z)) + ∇

G(z)

log p

(G(z))|

≤

|∇

G(z)

(G(z))J

+ ∇

G(z)

log p

(G(z))J

≤

|∇

G(z)

(G(z))J

+ ∇

log p

(G(z))|

(18)

where

is the smallest singular value of

. Detailed derivations are in the supplementary material.

If we choose p = 2, then we can use Hutchinson’s estimator (1989):

|∇

(x)J

+ ∇

log p

(G(z))|

= E

(∇

(x)J

v + ∇

log p

(G(z))v)

, (19)

where v ∼ N (0, I

). This is easily evaluated using automatic differentiation.

The second challenge is to evaluate

log p

(x)

which needs the Jacobian of the generator

G(z)

dictated by Equation (8). Here, we opt to use our entropy estimator as described above. We could

alternatively use Hutchinson’s estimator as proposed by Kumar et al. (2020). Experimentally we do

not observe much difference between these two estimators.

4 Related work

In machine learning, there has been a long-standing interest in EBMs dating back to Hopﬁeld

networks (Hopﬁeld, 1982), Boltzmann machines (Hinton and Sejnowski, 1983; Ackley et al., 1985)

and restricted Boltzmann machines (Smolensky, 1986; Hinton, 2002), see e.g. reviews in the works by

LeCun et al. (2006) and Scellier (2020). Learning and evaluation of these models are difﬁcult since the

normalization constant cannot be efﬁciently evaluated. MLE-based learning, such as the Boltzmann

learning rule, relies on expensive MCMC sampling to estimate the gradient, and more advanced

MCMC methods are used to reliably estimate the normalization constant (see e.g. Salakhutdinov and

Murray, 2008; Grosse et al., 2013; Burda et al., 2015; Frellsen et al., 2016). For images, MCMC-

based learning has been used to learn non-deep EBMs of both textures (Zhu et al., 1998; Zhu and

剩余18页未读，继续阅读

易小侠

粉丝: 6566
资源: 9万+

双向边界优化：全方位训练能量模型

CRB_final.zip_Cramer-Rao Bounds_Cramer-Rao bound_This Is It_cram

bounds = [x1_bounds, x2_bounds, x3_bounds, x4_bounds]

torch.tensor([row[2] for row in [x_bounds, y_bounds, z_bounds]])，用python复现

TypeError: from_bounds() missing 1 required positional argument: 'height'

杆长可变，自动确定工作空间边界，可自行定义动平台边长，可自行定义静平台边长的3-rps并联机器人完整工作空间matlab代码

speed_bounds_final_decider

最新资源