深度学习中的神经微分方程模型

需积分: 0 37 浏览量更新于2024-07-17 收藏 3.81MB PDF 举报

"这篇论文‘Neural Ordinary Differential Equations’由Ricky T. Q. Chen、Yulia Rubanova、Jesse Bettencourt和David Duvenaud等人撰写，来自多伦多大学和Vector Institute，主要探讨了一种新的深度神经网络模型——微分神经网络。该模型参数化隐藏状态的导数，利用神经网络来实现，通过黑盒微分方程求解器计算网络输出。这种连续深度模型具有恒定的内存成本，可以根据输入自适应地调整评估策略，并能明确地在数值精度和速度之间进行权衡。" 正文: 这篇论文引入的"Neural Ordinary Differential Equations（NDEs）"是一种深度学习的新方法，它改变了传统神经网络的设计思路。传统的深度学习模型通常由一系列离散的隐藏层组成，而NDEs则是通过定义隐藏状态的微分方程来建模，这使得模型能够以连续的方式表达复杂的动态过程。在NDEs中，隐藏状态的演化不再通过固定结构的层进行，而是由一个神经网络来描述其时间导数。这个神经网络称为“微分方程网络”，它输出的是隐藏状态随时间变化的速度。通过使用黑盒微分方程求解器，我们可以得到隐藏状态在任意时间点的值，这种方法允许模型根据输入数据的特性动态地改变其处理方式。论文展示了NDEs在连续深度残差网络和连续时间潜变量模型中的应用。连续深度残差网络（Continuous Residual Networks）扩展了传统的残差网络概念，使其能够在连续的时间域中工作，这有助于捕捉更精细的动态行为。另一方面，连续时间潜变量模型则提供了一种处理时间序列数据的新途径，它可以更好地模拟真实世界中连续变化的过程。此外，论文还介绍了连续归一化流（Continuous Normalizing Flows），这是一种生成模型，它能够通过最大似然训练，无需对数据维度进行分区或排序。这一创新使得模型在训练时能更加灵活高效。在训练方面，论文提出了一种方法，可以端到端地通过任何微分方程求解器进行反向传播，即使无法访问求解器的内部操作。这极大地扩展了NDEs在大型模型中的应用可能性，使得整个系统可以作为一个整体进行优化。 "Neural Ordinary Differential Equations"这篇论文为深度学习领域带来了革命性的创新，它将微分方程的概念融入神经网络，创建出更为灵活且适应性强的模型，对于理解和建模动态系统提供了新的工具和思路。这些模型不仅在理论上有重要意义，而且在实际应用中也展示出了强大的潜力，特别是在处理连续时间序列数据和生成模型的任务上。

Error Control in ODE-Nets

ODE solvers can approximately ensure that the output is within a

given tolerance of the true solution. Changing this tolerance changes the behavior of the network.

We ﬁrst verify that error can indeed be controlled in Figure 3a. The time spent by the forward call is

proportional to the number of function evaluations (Figure 3b), so tuning the tolerance gives us a

trade-off between accuracy and computational cost. One could train with high accuracy, but switch to

a lower accuracy at test time.

Figure 3: Statistics of a trained ODE-Net. (NFE = number of function evaluations.)

Figure 3c) shows a surprising result: the number of evaluations in the backward pass is roughly

half of the forward pass. This suggests that the adjoint sensitivity method is not only more memory

efﬁcient, but also more computationally efﬁcient than directly backpropagating through the integrator,

because the latter approach will need to backprop through each function evaluation in the forward

pass.

Network Depth

It’s not clear how to deﬁne the ‘depth‘ of an ODE solution. A related quantity is

the number of evaluations of the hidden state dynamics required, a detail delegated to the ODE solver

and dependent on the initial state or input. Figure 3d shows that he number of function evaluations

increases throughout training, presumably adapting to increasing complexity of the model.

4 Continuous Normalizing Flows

The discretized equation

(1)

also appears in normalizing ﬂows (Rezende and Mohamed, 2015) and

the NICE framework (Dinh et al., 2014). These methods use the change of variables theorem to

compute exact changes in probability if samples are transformed through a bijective function f:

= f(z

) =⇒ log p(z

) = log p(z

) − log



det

∂f

∂z



(6)

An example is the planar normalizing ﬂow (Rezende and Mohamed, 2015):

z(t + 1) = z(t) + uh(w

z(t) + b), log p(z(t + 1)) = log p(z(t)) − log



1 + u

∂h

∂z



(7)

Generally, the main bottleneck to using the change of variables formula is computing of the deter-

minant of the Jacobian

∂f

/∂z

, which has a cubic cost in either the dimension of

, or the number

of hidden units. Recent work explores the tradeoff between the expressiveness of normalizing ﬂow

layers and computational cost (Kingma et al., 2016; Tomczak and Welling, 2016; Berg et al., 2018).

Surprisingly, moving from a discrete set of layers to a continuous transformation simpliﬁes the

computation of the change in normalizing constant:

Theorem 1

(Instantaneous Change of Variables)

Let

z(t)

be a ﬁnite continuous random variable

with probability

p(z(t))

dependent on time. Let

= f(z(t), t)

be a differential equation describing

a continuous-in-time transformation of

z(t)

. Assuming that

is uniformly Lipschitz continuous in

and continuous in t, then the change in log probability also follows a differential equation,

∂ log p(z(t))

∂t

= −tr



dz(t)



(8)

Proof in Appendix A. Instead of the log determinant in

(6)

, we now only require a trace operation.

Also unlike standard ﬁnite ﬂows, the differential equation

does not need to be bijective, since if

uniqueness is satisﬁed, then the entire transformation is automatically bijective.

剩余18页未读，继续阅读

爱吃肉的悟空

粉丝: 105
资源: 12

深度学习中的神经微分方程模型

《Ordinary Differential Equations》——William A. Adkins

[Teschl,_Gerald]_Ordinary_differential_equations_a(z-lib.org).pdf

Neural-Manifold-Ordinary-Differential-Equations:[NeurIPS 2020] [ICML INNF 2020]神经歧管常微分方程（https

neural ordinary differential equations

MTGODE笔记.pdf

alglib-3.2.0.cpp.zip_ALGEBRA 2_DIFFERENTIAL neural_EVD SVD_algli

Information-Theoretic Aspects of Neural Networks

neural-ode-metasolver:论文“神经微分方程的元求解器”的补充代码https

awesome-neural-ode:关于微分方程、动力系统、深度学习、控制和优化之间相互作用的资源集合

DiffEqBayes.jl：扩展功能，使用Stan.jl，DynamicHMC.jl和Turing.jl估计微分方程的参数并执行贝叶斯概率科学机器学习

最新资源