IMV-LSTM:多变量LSTM的可解释预测与知识挖掘

需积分: 10 84 浏览量更新于2024-07-18 收藏 1.32MB PDF 举报

本文主要探讨了在多变量时间序列预测任务中，LSTM神经网络的可解释性问题。随着深度学习在复杂数据建模中的广泛应用，特别是在经济、金融、环境科学等领域，准确的预测能力和模型的可解释性变得至关重要。传统的LSTM（长短时记忆）模型在处理多变量时间序列时，虽然在预测性能上表现出色，但其内部决策过程往往难以理解，这限制了其在实际应用中的可信度。为了解决这一问题，研究者提出了可解释的多变量LSTM（IMV-LSTM）。IMV-LSTM的核心创新在于引入了张量化的隐藏状态和更新过程，使得模型能够学习每个变量独立的隐藏状态，从而增强对输入特征的分解理解和处理能力。这种设计使得IMV-LSTM在保持高预测精度的同时，提供了关于时间和变量级别的重要性解读。文章进一步发展了一种混合注意力机制，通过量化数据中的时间依赖性和变量关联性，实现了对预测过程的深入洞察。混合注意力机制结合了自注意力和全局注意力，能够捕捉到不同时间步和变量之间的复杂交互关系，这对于识别哪些时间段或变量对预测结果有显著影响具有重要意义。实证研究部分，作者使用了多个真实世界的数据集来评估IMV-LSTM与各种基线模型（如传统的LSTM、GRU、以及解释性较差的黑盒模型）的性能对比。结果显示，IMV-LSTM在预测准确性上不仅不逊色于这些模型，而且在可解释性方面具有明显优势。它作为端到端的框架，不仅提供了可靠的预测，还为用户揭示了数据背后的潜在规律和关键驱动因素。本文提出并验证了一个有效的深度学习模型——IMV-LSTM，它在多变量时间序列分析中兼顾了预测效能和可解释性。这对于那些需要理解和信任预测模型的领域，如风险管理、政策制定等，无疑具有重要的实践价值。同时，该研究也为深度学习模型的可解释性探索提供了新的思路和方法。

Under review as a conference paper at ICLR 2019

= σ



W [x

⊕ vec(

t−1

)] + b



(2)

= f

 c

t−1

+ i

 vec(

) (3)

= matricization(o

 tanh(c

)) (4)

Equation set 1: IMV-Full





˜o





= σ



W ~

t−1

+ U ~ x

+ b



(5)

˜c

 ˜c

t−1



(6)

= ˜o

 tanh(˜c

) (7)

Equation set 2: IMV-Tensor

IMV-Full

: With vectorization in Eq.

(2)

and

(3)

, IMV-Full updates gates and memories using full

t−1

and

regardless of the variable-wise data in them. By simple replacement of the hidden

update in standard LSTM by

, IMV-Full behaves identically to standard LSTM while enjoying the

interpretability shown below.

IMV-Tensor

: By applying tensor-dot operations in Eq.

(5)

, gates and memory cells are matrices as

well, elements of which have the correspondence to input variables as hidden state matrix

does.

In IMV-Full and IMV-Tensor, gates only scale

and

˜c

t−1

and thus retain the variable-wise data

organization in

. Meanwhile, based on tensorized hidden state Eq.

(1)

and gate update Eq.

(5)

IMV-Tensor can also be considered as a set of parallel LSTMs, each of which processes one variable

series. The derived hidden states speciﬁc to individual variables are aggregated on both temporal and

variable level through the attention described below.

3.2 MIXTURE ATTENTION

After feeding a sequence of

, · · · , x

}

into either IMV-Full or IMV-Tensor, we obtain a sequence

of hidden state matrices

{

, · · · ,

}

, where the sequence of hidden states speciﬁc to variable

extracted as {h

, · · · , h

In this part, we present the novel mixture attention mechanism in IMV-LSTM based on the following

idea. Temporal attention is ﬁrst applied to the sequence of hidden states corresponding to each

variable, so as to obtain the summarized history of each variable. Then by using the history enriched

hidden state of each variable, global variable attention is derived. These two steps are assembled into

a probabilistic mixture model (Zong et al., 2018; Graves, 2013; Bishop, 1994), which facilitates the

subsequent training, inference, and interpretation process.

In particular, the mixture attention is formulated as:

p(y

T +1

) =

n=1

p(y

T +1

= n, X

) · p(z

T +1

= n|X

)

n=1

p(y

T +1

| z

T +1

= n, h

, · · · , h

) · p(z

T +1

= n |

, · · · ,

)

n=1

p(y

T +1

| z

T +1

= n, h

⊕ g

| {z }

variable-wise

temporal attention

) · p(z

T +1

= n | h

⊕ g

, · · · , h

⊕ g

)

| {z }

overall variable attention

(8)

In Eq.

(8)

, we introduce a latent random variable

T +1

into the the density function of

T +1

to govern the generation process.

T +1

is a discrete variable over the set of values

{1, · · · , N}

corresponding to

input variables. Mathematically,

p(y

T +1

| z

T +1

= n, h

⊕ g

)

characterizes the

density of y

T +1

conditioned on historical data of variable n, while the prior of z

T +1

, i.e. p(z

T +1

n | h

⊕ g

, · · · , h

⊕ g

) controls to what extent y

T +1

is driven by variable n.

The context vector

is computed as the temporal attention weighted sum of hidden states corre-

sponding to variable

, i.e.,

where attention weight

exp ( f

) )

exp ( f

) )

(·)

can be a ﬂexible function speciﬁc to variable

, e.g., neural networks (Bahdanau et al., 2014). The

p(y

T +1

| z

T +1

= n, h

⊕ g

)

is a Gaussian distribution parameterized by

[ µ

, σ

] = ϕ

( h

⊕ g

)

where ϕ

(·) can be a feedforward neural network.

剩余16页未读，继续阅读

Pumpkin_tong

粉丝: 40
资源: 54

IMV-LSTM:多变量LSTM的可解释预测与知识挖掘

"PyTorch深度学习：从人工神经网络到无限可能性

"SAS统计量计算：单变量过程句法

"深入了解计算机体系结构：理论与实践"。

exploring-neural-data-course:探索神经数据编程练习

Belgium-Provinces-Project---Data-Methods-for-Exploring-and-Visualizing-Economic-Indicators:从比利时国家银行的在线统计网站导入，处理，清理和合并数据

Exploring the location of object deleted by seam-carving

dsc-exploring-your-data-lab-london-ds-111819

dsc-exploring-your-data-lab-online-ds-ft-100719

dsc-exploring-your-data-lab-online-ds-pt-100719

Deep Neural Networks in a Mathematical Framework-Springer(2018).pdf

最新资源