变分双流LSTM：增强序列预测的双向合作

需积分: 19 6 浏览量更新于2024-09-07 收藏 486KB PDF 举报

变分双流LSTM（Variational Bi-LSTMs）是一种改进的循环神经网络（RNN）架构，特别是针对序列预测任务中的长期依赖关系建模。LSTM，即长短期记忆网络，作为RNN的一种变体，通过引入门控机制，解决了传统RNN中的梯度消失和爆炸问题，从而在处理序列数据时具有更好的稳定性和记忆能力。而双向LSTM（Bi-LSTM）则是LSTM的一个扩展，它不仅沿着时间序列的正向方向进行建模，还逆向建模，能够捕捉到更丰富的上下文信息，因此在诸如自然语言处理、语音识别等任务中表现更优。然而，传统的Bi-LSTM在训练过程中，两个方向的路径是独立学习的。Variational Bi-LSTM正是为了解决这一局限性而提出的。其核心思想是创建一个双向路径之间的信息共享通道，尤其是在训练阶段，两个方向的LSTM模型可以协同工作，共同优化目标。这个目标是通过最小化数据序列的联合似然的变分下界来实现的，这种方法类似于引入了一个正则化项，促使两个方向的模型相互影响，提高预测的准确性。与传统的Bi-LSTM不同，Variational Bi-LSTM在一定程度上减少了独立决策的局限性，允许模型在预测过程中考虑双向路径的交互信息，从而可能提升模型的泛化能力和性能。这种变分设计不仅有助于减少过拟合，还可以增强模型对复杂序列模式的理解，尤其在需要深层次理解上下文关系的任务中，如机器翻译、情感分析等。总结来说，Variational Bi-LSTM是循环神经网络领域的一个创新，它通过结合变分建模和双向LSTM的优势，实现了模型间的协同学习，提升了序列数据处理任务中的表达能力。这种技术在深度学习社区中得到了广泛的关注，并且在实际应用中展示了显著的性能提升。

t−1

t+1

t−1

(a) Training phase of variational Bi-LSTM

t−1

t+1

(b) Inference phase of variational Bi-LSTM

Figure 1: Graphical description of our proposed variational Bi-LSTM model during train phase

(left) and inference phase (right). During training, each step t is composed of an encoder which

receives both the past and future summary via h

t−1

and b

respectively, and a decoder that generates

t−1

and

which are forced to be close enough to h

t−1

and b

using two auxiliary reconstruction

costs (dashed lines). This dependence between backward and forward LSTM through the latent

random variable encourages the forward LSTM to learn a richer representation. During inference,

the backward LSTM is removed. In this case, z

is sampled from the prior as in a typical VAE,

which in our case, is deﬁned as a function of h

t−1

By design, the joint conditional distribution p

θ,ψ

1:t

) over latent variables z

and

with

parameters θ and ψ factorizes as p

1:t

(

). This factorization enables us to formu-

late several helpful auxiliary costs, as deﬁned in the next subsection. Further, p

t+1

1:t

, z

)

deﬁnes the generating model, which induces the distribution over the next observation given the

previous states and the current input.

Then the marginal likelihood of each individual sequential data sample x can be written as

p(x; Γ) =

t=0

p(x

t+1

1:t

)

t=0

t+1

1:t

, z

(

1:t

)

(3)

where Γ = {φ, θ, ψ, η} is the set of all parameters of the model. Here, we assume that all con-

ditional distributions belong to parametrized families of distributions which can be evaluated and

sampled from efﬁciently.

Note that the joint distribution in equation (3) is intractable. Kingma & Welling (2014) demonstrated

how to maximize a variational lower bound of the likelihood function. Here we derive a similar lower

bound for the joint likelihood log p(x; Γ) given as L

, of the data log likelihood, which is given by

log p(x; Γ) ≥ L

t=0

∼q

1:t

)

∼p

(

)

log p

t+1

1:t

, z

)

(4)

− D

1:t

)kp

1:t

)), (5)

where q

|x) is the conditional inference model, D

is the Kullback-Leibler (KL) divergence

between the approximate posterior and the conditional prior (see the appendix). Notice the above

function L

is a general lower bound that is not explicitly deﬁned in terms of h

and b

, but rather all

the terms are conditional upon the previous predictions x

1:t

. The choice of how the model is deﬁned

in terms of h

and b

is a design choice which we will make more explicit in the next section.

剩余11页未读，继续阅读

haqqha12345

粉丝: 0

变分双流LSTM：增强序列预测的双向合作

实现Variational-Recurrent-Autoencoder模型处理MIDI文件

掌握CVB-Copula-Variational-Bayes算法仿真与代码操作

Matlab实现Copula-Variational-Bayes二元高斯分析仿真

Variational B-Spline Level-Set

Detail-Preserving and Content-Aware Variational Multi-View Stereo Reconstruction

2019-加D-wave公司quadrant组-All SMILES Variational Autoencoder-多GRU+

2019-加D-wave公司quadrant组-All SMILES Variational Autoencoder-研发团队1

Temporal Difference Variational Auto-Encoder_序列生成模型+VAE1

variational-algorithms-seminar-RWTH

Variational-Autoencoders-on-Anime-Faces

最新资源