注意力机制的可解释性：ACL2019论文解析

需积分: 10 170 浏览量更新于2024-07-15 收藏 1.85MB PDF 举报

"这篇ACL2019论文《Is Attention Interpretable》探讨了注意力机制在自然语言处理（NLP）任务中的可解释性问题。作者Sofia Serrano和Noah A. Smith研究了是否可以依赖注意力权重来识别模型所认为的重要信息，如特定上下文中的单词令牌。他们通过操纵已经训练好的文本分类模型的注意力权重并分析预测结果的变化，发现虽然在某些情况下，较高的注意力权重与模型预测的影响正相关，但在许多情况下并非如此。换句话说，基于梯度的注意力权重排名比其绝对值更能准确预测它们的影响。这表明，尽管注意力机制在一定程度上能模糊地预测输入组件的整体重要性，但不能完全被视为一种可靠的解释工具。" 正文: 注意力机制近年来在提高各种NLP任务的性能方面发挥了关键作用。这些机制通过赋予输入组件表示不同的权重，使得模型能够“关注”某些关键信息，这通常被认为能让人们理解模型是如何做出决策的。然而，《Is Attention Interpretable》这篇论文质疑了这一假设，并进行了深入的实证研究。作者通过操纵已训练好的文本分类模型的注意力权重，观察这些变化如何影响模型的预测输出。实验结果显示，虽然在某些情况下，较高的注意力权重确实与模型预测的改变有较强的相关性，但这并不总是成立的。他们发现，许多情况下，基于梯度的注意力权重排序比权重的大小更能准确地预测输入组件对模型预测的影响。这种现象揭示了一个重要的问题：注意力机制虽然提供了一种直观的解释方式，但其解释性并不如我们期望的那样强。模型可能依据复杂的内部计算过程，而非简单的注意力权重分配来决定最终的预测。这意味着，如果我们要真正理解模型的决策过程，可能需要更深入地探索隐藏层的活动，或者寻找其他形式的模型解释方法。此外，论文还指出，注意力机制的解释性问题可能限制了我们在NLP领域的进步。为了提高模型的透明度和可解释性，研究人员需要开发新的方法，以更准确地捕获输入组件对模型预测的重要性。这不仅有助于提升模型的信任度，还能促进AI伦理和公平性的实践，因为用户需要知道模型的决策背后的原因。《Is Attention Interpretable》这篇论文挑战了我们对注意力机制的理解，强调了在依赖它进行模型解释时需谨慎。尽管注意力机制在某些方面提供了有价值的见解，但它并不能全面地反映出模型的决策过程。因此，未来的NLP研究应该更多地关注于开发新的、更可靠的解释技术，以增强模型的可解释性和可靠性。

attention to be interpretable, the blue, upper-right

values (i

∗

, not r, ﬂips a decision) should be much

larger than the orange, lower-left values (r, not i

∗

ﬂips a decision), which should be close to zero.

Although for some datasets in Table 2, the “or-

ange” values are non-negligible, we mostly see

that their fraction of total off-diagonal values mir-

rors the fraction of negative occurrences of Eq. 1

in Figure 4. However, it’s somewhat startling that

in the vast majority of cases, erasing i

∗

does not

change the decision (“no” row of each table). This

is likely explained in part by the signal pertinent

to the classiﬁcation being distributed across a doc-

ument (e.g., a “Sports” question in the Yahoo An-

swers dataset could signal “sports” in a few sen-

tences, any one of which sufﬁces to correctly cate-

gorize it). However, given that these results are for

the HAN models, which typically compute atten-

tion over ten or fewer sentences, this is surprising.

Altogether, examining importance from a

single-weight angle paints a tentatively positive

picture of attention’s interpretability, but also

raises several questions about the many cases

where the difference in impacts between i

∗

and r

is almost identical (i.e., ∆JS values close to 0 or

the many cases where neither i

∗

nor r cause a de-

cision ﬂip). To answer these questions, we require

tests with a broader scope.

5 Importance of Sets of Attention

Weights

Often, we care about determining the collective

importance of a set of components I

. To address

that aspect of attention’s interpretability and close

gaps left by single-weight tests, we introduce tests

to determine how multiple attention weights per-

form together as importance predictors.

5.1 Multi-Weight Tests

For a hypothesized ranking of importance, such as

that implied by attention weights, we would ex-

pect the items at the top of that ranking to func-

tion as a concise explanation for the model’s deci-

sion. The less concise these explanations get, and

the farther down the ranking that the items truly

driving the model’s decision fall, the less likely it

becomes for that ranking to truly describe impor-

tance. In other words, we expect that the top items

We see this pattern especially strongly for FLANs (see

Appendix), which is unsurprising since I is all words in the

input text, so most attention weights are very small.

in a truly useful ranking of importance would com-

prise a minimal necessary set of information for

making the model’s decision.

The idea of a minimal set of inputs necessary

to uphold a decision is not new; Li et al. (2016)

use reinforcement learning to attempt to construct

such a minimal set of words, Lei et al. (2016) train

an encoder to constrain the input prior to clas-

siﬁcation, and much of the work that has been

done on extractive summarization takes this con-

cept as a starting point (Lin and Bilmes, 2011).

However, such work has focused on approximat-

ing minimal sets, instead of evaluating the ability

of other importance-determining “shortcuts” (such

as attention weight orderings) to identify them.

Nguyen (2018) leveraged the idea of minimal sets

in a much more similar way to our work, compar-

ing different input importance orderings.

Concretely, to assess the validity of an impor-

tance ranking method (e.g., attention), we begin

erasing representations from the top of the rank-

ing downward until the model’s decision changes.

Ideally, we would then enumerate all possible

subsets of that instance’s components, observe

whether the model’s decision changed in response

to removing each subset, and then report whether

the size of the minimal decision-ﬂipping subset

was equal to the number of items that had needed

to be removed to achieve a decision ﬂip by follow-

ing the ranking. However, the exponential num-

ber of subsets for any given instance’s sequence of

components (word or sentence representations, in

our case) makes such a strategy computationally

prohibitive, and so we adopt a different approach.

Instead, in addition to our hypothesized impor-

tance ranking (attention weights), we consider al-

ternative rankings of importance; if, using those,

we repeatedly discover cases where removing a

smaller subset of items would have sufﬁced to

change the decision, this signals that our candidate

ranking is a poor indicator of importance.

5.2 Alternative Importance Rankings

Exhaustively searching the space of component

subsets would be far too time-consuming in prac-

tice, so we introduce three other ranking schemes.

The ﬁrst is to randomly rank importance. We

expect that this ranking will perform quite poorly,

but it provides a point of comparison by which

to validate that ranking by descending attention

weights is at least somewhat informative.

剩余20页未读，继续阅读

MonkeyDogFox

粉丝: 6
资源: 3

注意力机制的可解释性：ACL2019论文解析

"深度神经网络的可解释代理通信与通用可视处理器的创新实践

Interpretable-ADMET: 基于深度神经网络的ADMET预测优化服务代码

深度学习端到端可解释神经运动规划器

End-to-end Interpretable Neural Motion Planner.pdf

Interpretable Machine Learning by Christoph Molnar .pdf

运维AI时代-百度如何构建AIOps体系.pdf

2017, Sean J. Taylor, Benjamin Letham, Forecasting at Scale.pdf

A tensorized logic programming language for large-scale data.pdf

基于LIME的改进机器学习可解释性方法.pdf

智能问答系统与机器阅读理解分方向综述.pdf

最新资源