信息论视角解析深度神经网络黑箱

需积分: 50 90 浏览量更新于2024-07-18 收藏 3.74MB PDF 举报

"《揭开深度神经网络的黑箱：信息论视角》是一篇深入研究深度学习理论的论文。作者们在文中提出了一个新颖的方法，即通过信息论的角度来理解深度神经网络（DNN）的学习过程及其内在组织。他们回顾了之前的研究，特别是Tishby和Zaslavsky（2015）的工作，这些研究者将DNN的学习过程映射到信息平面——一个由各层之间保留的输入和输出变量之间的互信息值构成的空间。信息平面理论认为，深度神经网络的目标是逐步优化信息瓶颈（IB）原理，即在压缩输入信息与预测输出之间找到一个平衡。在这个过程中，每一层都在寻求最优的压缩和预测之间的权衡。压缩是指减少不必要的输入细节，而预测则是指利用这些信息进行准确的输出生成。论文作者在这个基础上进一步探索，展示了信息平面可视化方法的有效性。通过这种可视化工具，研究人员可以直观地观察到网络内部的信息流动和决策过程，从而揭示了网络学习过程中哪些特征被保留，哪些被忽略，以及这些决策如何随网络层次加深而演变。这种理解有助于我们洞察网络的复杂行为，提高模型的可解释性和透明度，同时也有助于优化模型架构和训练策略，以提升整体性能。这篇论文对深度学习的理论基础进行了深入剖析，强调了信息论在理解和设计深度神经网络中的核心作用。它不仅提供了一个新的分析框架，而且为未来的模型设计和优化提供了有价值的洞见。对于那些关心深度学习内部机制和可解释性的研究人员来说，这是一篇不可或缺的重要文献。"

展开

SCHWARTZ-ZIV AND TISHBY

2. Information Theory of Deep Learning

In supervised learning we are interested in good representations, T(X), of the input patterns x ∈ X,

that enable good predictions of the label y ∈ Y . Moreover, we want to efﬁciently learn such

representations from an empirical sample of the (unknown) joint distribution P (X, Y ), in a way

that provides good generalization.

DNNs and Deep Learning generate a Markov chain of such representations, the hidden layers,

by minimization of the empirical error over the weights of the network, layer by layer. This opti-

mization takes place via stochastic gradient descent (SGD), using a noisy estimate of the gradient

of the empirical error at each weight, through back-propagation.

Our ﬁrst important insight is to treat the whole layer, T , as a single random variable, charac-

terized by its encoder, P (T |X), and decoder, P (Y |T ) distributions. As we are only interested in

the information that ﬂows through the network, invertible transformations of the representations,

that preserve information, generate equivalent representations even if the individual neurons encode

entirely different features of the input. For this reason we quantify the representations by two num-

bers, or order parameters, that are invariant to any invertible re-parameterization of T , the mutual

information of T with the input X and the desired output Y .

Next, we quantify the quality of the layers by comparing them to the information theoretic

optimal representations, the Information Bottleneck representations, and then describe how Deep

Learning SGD can achieve these optimal representations.

2.1 Mutual Information

Given any two random variables, X and Y , with a joint distribution p(x, y), their Mutual Informa-

tion is deﬁned as:

I(X; Y ) = D

[p(x, y)||p(x)p(y)] =

x∈X,y∈Y

p(x, y) log



p (x, y)

p (x) p (y)



(1)

x∈X,y∈Y

p (x, y) log



p (x|y)

p (x)



= H(X) − H(X|Y ) , (2)

where D

[p||q] is the Kullback-Liebler divergence of the distributions p and q, and H(X) and

H(X|Y ) are the entropy and conditional entropy of X and Y , respectively.

The mutual information quantiﬁes the number of relevant bits that the input variable X contains

about the label Y , on average. The optimal learning problem can be cast as the construction of an

optimal encoder of that relevant information via an efﬁcient representation - a minimal sufﬁcient

statistic of X with respect to Y - if such can be found. A minimal sufﬁcient statistic can enable

the decoding of the relevant information with the smallest number of binary questions (on average);

i.e., an optimal code. The connection between mutual information and minimal sufﬁcient statistics

is discussed in 2.3.

Two properties of the mutual information are very important in the context of DNNs. The ﬁrst

is its invariance to invertible transformations:

I (X; Y ) = I (ψ(X); φ(Y ))) (3)

for any invertible functions φ and ψ.

下载后可阅读完整内容，剩余18页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

梅mmmmm

粉丝: 18

信息论视角解析深度神经网络黑箱

深度学习基础及卷积神经网络应用详解

"深度学习框架下的单事件视频描述方法研究:人工智能视觉与自然语言结合的新进展

"优化快速神经网络计算方法：卷积定理与点积方法对比研究"。

Efficient Processing of Deep Neural Networks

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Mastering the game of Go with deep neural networks and tree search 中英文

《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》

A Survey of the Recent Architectures of Deep Convolutional Neural Networks.pdf

Understanding the difficulty of training deep feedforward neural networks

【8】Deep neural networks

最新资源