深度学习的统计视角：递归广义线性模型详解

需积分: 9 180 浏览量更新于2023-05-26 1 收藏 1.34MB PDF 举报

本文档探讨了深度学习的一种统计视角，即将其视为递归广义线性模型（RGLMs）。深度学习和深度神经网络如今已成为实用机器学习的核心工具，而这种统计关联对于理解其工作原理至关重要。作者Shakir Mohamed写作此系列文章的主要动机有两个：一是通过实践来深化自己对深度学习的理解，二是揭示那些在主流深度学习课程、文献中未充分阐述但至关重要的统计联系。首先，第1章详细介绍了广义线性模型（GLMs），这是概率建模的基础，广泛应用于各个实验科学领域。RGLMs则是GLMs的递归扩展，它们能够处理复杂的结构化数据，并且在深度神经网络中起到了关键作用。学习和估计过程在这里被深入剖析，展示了如何通过这些模型进行预测和参数优化。接着，第2章转向了自编码器（auto-encoders）和自由能的概念。广义去噪自编码器（Generalised Denoising Auto-encoders）是训练深度神经网络的一种方法，它涉及到模型与推断分离，强调了在有潜在变量模型中的近似推断技术。这部分内容展示了深度学习如何作为数据降维和特征提取的强大工具。最后，第3章聚焦于记忆和核方法。章节中讨论了基础函数和神经网络之间的关系，指出神经网络如何通过基函数实现复杂非线性映射。此外，还对比了核方法和高斯过程（Gaussian Processes），这些方法在深度学习中扮演着相似但略有不同的角色，尤其是在处理复杂函数拟合和非线性关系时。总结来说，本文提供了一个深入的统计视角，揭示了深度学习背后的数学原理，有助于读者更好地理解和应用这一强大的机器学习工具。通过理解RGLMs、自编码器、记忆机制以及它们与传统统计方法的关系，研究者和实践者可以更有效地构建和优化深度学习模型，解决实际问题。

A natural approach is to use the negative log-probability as the loss

function and maximum likelihood estimation [3]:

L = − log p(y|µ

)

where if using the Gaussian distribution as the likelihood function

we obtain the squared loss, or if using the Bernoulli distribution we

obtain the cross entropy loss. Estimation or learning in deep neural

networks corresponds directly to maximum likelihood estimation in

recursive GLMs. We can now solve for the regression parameters by

computing gradients w.r.t. the parameters and updating using gradi-

ent descent. Deep learning methods now always train such models

using stochastic approximation (using stochastic gradient descent),

using automated tools for computing the chain rule for derivatives

throughout the model (i.e. back-propagation), and perform the com-

putation on powerful distributed systems and GPUs. This allows

such a model to be scaled to millions of data points and to very large

models with potentially millions of parameters [4].

From the maximum likelihood theory, we know that such estimators

can be prone to overﬁtting and this can be reduced by incorporat-

ing model regularisation, either using approaches such as penalised

regression and shrinkage, or through Bayesian regression. The impor-

tance of regularisation has also been recognised in deep learning and

further exchange here could be beneﬁcial.

1.4 summary

Deep feed-forward neural networks have a direct correspondence to

recursive generalised linear models and basis function regression in

statistics – which is an insight that is useful in demystifying deep

networks and an interpretation that does not rely on analogies to

sequential processing in the brain. The training procedure is an ap-

plication of (regularised) maximum likelihood estimation, for which

we now have a large set of tools that allow us to apply these models to

very large-scale, real-world systems. A statistical perspective on deep

learning points to a broad set of knowledge that can be exchanged

between the two ﬁelds, with the potential for further advances in ef-

ﬁciency and understanding of these regression problems. It is thus

one I believe we all beneﬁt from by keeping in mind. There are other

viewpoints such as the connection to graphical models, or for recur-

rent networks, to dynamical systems, which I hope to think through

in the future.

剩余30页未读，继续阅读

somTian

粉丝: 104
资源: 9

深度学习的统计视角：递归广义线性模型详解

A Statistical View of Some Chemometrics Regression Tools .pdf

The Elements of Statistical Learning

Dive into Deep Learning

the elements of statistical learning

statistical summary of the dataframe.

The Elements of Statistical Learning 引用

the elements of statistical learning 第四章答案

body shape regression

an overview of statistical learning theory

the elements of statistical learning资源

最新资源