贝叶斯神经网络详解：深度学习中的不确定性量化指南

5星 · 超过95%的资源需积分: 15 29 浏览量更新于2024-07-15 1 收藏 1.29MB PDF 举报

本文是一篇深入浅出的ACM Computing Surveys文章，旨在为深度学习领域的用户，特别是研究人员和工程师们，提供关于贝叶斯神经网络的全面指南。随着深度学习技术的崛起，它已成为解决复杂问题的强大工具，但在处理诸如自然语言处理、图像识别等任务时，其内部运作机制由于被视为“黑箱”，导致与之相关的预测不确定性难以量化。贝叶斯统计方法在这个背景下崭露头角，它提供了一种严谨的框架来理解和量化深度神经网络预测过程中的不确定性。在传统的深度学习模型中，模型参数通常是通过最大似然估计（MLE）训练得到，这种方法忽略了不确定性的重要性。相比之下，贝叶斯神经网络（Bayesian Neural Networks, BNNs）通过集成概率模型，赋予了每个参数一个概率分布，从而能更好地表达对未知数据的不确定性。在本文中，作者Laurent Valentin Jospin、Wray Buntine、Farid Boussaid、Hamid Laga和Mohammed Bennamoun，分别来自澳大利亚西部大学、莫纳什大学等机构，梳理了贝叶斯网络模型和集成方法的相关理论基础，包括贝叶斯推理、后验推断、马尔科夫链蒙特卡洛（Markov Chain Monte Carlo, MCMC）等技术在BNN中的应用。此外，他们还讨论了如何设计、实现、训练和评估这些基于贝叶斯原理的神经网络，以及它们在模型融合、模型选择和主动学习中的优势。对于那些已经熟悉深度学习但希望扩展到不确定性建模领域的读者，这篇教程提供了实用的工具和策略，帮助他们理解并应对深度学习黑盒中的挑战。通过阅读本文，用户将掌握如何在实际项目中应用贝叶斯神经网络，以提高模型的可靠性和决策能力。同时，文章也强调了贝叶斯方法在处理高维数据、小样本场景以及对抗性攻击等方面的重要作用，展示了贝叶斯神经网络在未来AI发展中的潜在价值。

Hands-on Bayesian Neural Networks - a Tutorial for Deep Learning Users 7

If the cost of a false positive varies across dierent classes, it should be used to compute the risk

and choose the minimal risk prediction.

3 MOTIVATION FOR BAYESIAN METHODS IN DEEP LEARNING

Dening a prior belief

p(θ)

on the model parametrization (Section 2) is regarded by some users to

be hard if not impossible. Dening a prior for a simple functional model is considered intuitive, e.g.,

explicitly adding a regularization term to favor a lower degree polynomial function or a smoother

function [

]. However, dening priors is harder for the multi-layer models used in deep learning.

So, why do we bother to use Bayesian methods for deep learning given that it is hard to clearly

comprehend deep neural networks behavior when dening the priors? The functional relationship

encoded by an articial neural network implicitly represents the conditional probability

p(y|x, θ)

and Bayes formula is an appropriate tool to use to invert conditional probabilities, even if one has a

priori little insight about

p(θ)

. While there are very strong theoretical principles and schema on

which this Bayes formula can be based [

], we focus in this section on some practical benets of

using Bayesian Deep networks.

First, Bayesian methods provide a natural approach to

quantify uncertainty

in deep learning.

Bayesian neural networks often have better calibration than classical neural networks [

i.e., their predicted uncertainty is more consistent with the observed errors. In other words, they

are neither overcondent nor undercondent compared to their non-Bayesian counterpart.

Working with a Bayesian neural network allows to distinguish between

epistemic uncertainty

i.e., the uncertainty due to a lack of knowledge, measured by

p(θ |D)

, which can be reduced with

more data, and

aleatoric uncertainty

, i.e., the uncertainty due to the (partially) aleatoric nature of

the data and measured by

p(y|x, θ)

[

]. This makes BNNs very data ecient, as they can learn

from a small dataset without overtting. At prediction time, out-of-training distribution points will

just lead to high epistemic uncertainty. It also makes BNNs an interesting tool for active learning

[

], as one can interpret the model predictions and see if, for a given input, dierent probable

parametrizations lead to dierent predictions. In this latter case, labelling this specic input will

eectively reduce the epistemic uncertainty.

Furthermore, the No-free-lunch theorem for machine learning [

] can be interpreted as saying

that any supervised learning algorithm includes some kind of implicit prior (while this interpretation

is more philosophical than mathematical, and thus subject to discussion). Bayesian methods, when

used correctly, will at least make the prior explicit. Now, if

integrating prior knowledge

seems

hard with tools that are basically black boxes, it is not impossible. In Bayesian deep learning, priors

are often considered as soft constraints, like regularization. Most regularization methods already

used for point estimate neural networks can be understood from a Bayesian perspective as setting

a prior, as demonstrated in Section 5.3. Moreover, previously learned posterior can be recycled as

prior when new data becomes available. This makes Bayesian neural networks a valuable tool for

online learning [64].

Last but not least, the Bayesian paradigm enables

the analysis of learning methods

and

draws links between them. Some methods initially not presented as Bayesian can be

implicitly

understood

as being approximate Bayesian, like regularization (Sec.5.3) or ensembling (Sec.8.2.2).

This, in turn, supports the understanding of why certain methods that are easier to use than a

strict application of the Bayesian algorithms can still give meaningful results from a Bayesian

perspective. In fact, most Bayesian neural network architectures used in practice rely on methods

that are approximately or implicitly Bayesian (Sec.8), because the exact algorithms are often too

expensive. The Bayesian paradigm also provides a systematic framework to design new learning

and regularization strategies, even for point-estimate models.

ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: July 2020.

剩余34页未读，继续阅读

syp_net

粉丝: 158
资源: 1187

贝叶斯神经网络详解：深度学习中的不确定性量化指南

L-M 优化算法和贝叶斯正则化算法训练BP网络

CSUR 2012-ACM Computing Surveys 2012

贝叶斯神经网络建模预测方法及其应用

ACM娱乐练习项目：轻松入门指南

北大ACM题库源代码：深度学习与神经网络

ACM算法训练指南：从入门到精通

ACM图论基础教程：从入门到精通

ACM竞赛训练指南：从入门到提升

ACM竞赛提升计划：从入门到精通

ACM编程基础教程：从入门到精通

最新资源