机器学习笔记Bishop版PAML_Bishop剑桥

机器学习

Bishop版

5星 · 超过95%的资源需积分: 9 137 浏览量更新于2023-03-16 评论 1 收藏 760KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

PRML 笔记

Notes on Pattern Recognition and Machine Learning (Bishop)

Version 1.0

Jian Xiao

①

Checklist ..................................................................................................... 2

Chapter 1 Introduction ................................................................................ 4

Chapter 2 Probability Distribution ............................................................ 10

Chapter 3 Linear Models for Regression

.................................................. 14

Chapter 4 Linear Models for Classification

.............................................. 19

Chapter 5 Neural Networks ...................................................................... 26

Chapter 6 Kernel methods ........................................................................ 33

Chapter 7 Sparse Kernel Machine ............................................................ 39

Chapter 8 Graphical Models ..................................................................... 47

Chapter 9 Mixture Models and EM .......................................................... 53

Chapter 10 Approximate Inference ........................................................... 58

Chapter 11 Sampling Method ................................................................... 63

Chapter 12 Continuous Latent Variables .................................................. 68

Chapter 13 Sequential Data ...................................................................... 72

Chapter 14 Combining Models ................................................................. 74

①

iamxiaojian@gmail.com

Chapter 1 Introduction

1. Bayesian interpretation of probability

与其说是贝叶斯学派对“概率”这个概念的解释，不如说是概率碰巧可以作为量化贝叶斯学

派“degree of belief”这个概念的手段。

贝叶斯学派的出发点是“uncertainty”这个概念，对此给予“degree of belief”以表示不确定性。

Cox showed that if numerical values are used to represent degrees of belief, then a simple set of

axioms encoding common sense properties of such beliefs leads uniquely to a set of rules for

manipulating degrees of belief that are equivalent to the sum and product rules of probability.

因此之故，我们才可以 use the machinery of probability theory to describe the uncertainty in

model parameters.

( | )() ( | )()

(|)

()

( | )()

pD wpw pD wpw

pw D

p D w p w dw

= =

∫

对parameter的观点，以及Bayesian对先验、后验概率的解释

对于 Frequentist 来说，model parameter w 是一个 fixed 的量，用“estimator”来估计；最常

见的 estimator 是 likelihood。

对 Bayesian 来说，w 本身是一个不确定量，其不确定性用 prior probability p(w)表示。

为了获知 fixed 的 w，Frequentist 进行重复多次的试验，获得不同的 data sets D；

对于 Bayesian 而言，there is only a single data set D, namely the one that is actually observed.

在得到一个 observation D 后，贝叶斯学派要调整原来对于 w 的 belief（prior probability），用

后验概率 P(w|D)表示调整后的 belief。调整的方法是贝叶斯定理。

Bayesian 的中心定理是贝叶斯定理，该定理 convert a prior probability into a posterior

probability by incorporating the evidence provided by the observed data。其中的条件概率 P(D|w)

表示的是，how probable the observed data set is for different settings of parameter vector w。

其中分母里的 p(D)只是用于归一化的量，使得 p(w|D)确实是一个概率。而 p(D)的计算已经

给出在上面的分母中。

（理解后验概率：即修正后的先验概率。例如，有

C1,…,Ck

个类别，先验为

P(C1)

，

…

，

P(Ck)

，这个时候如果给一个未知类别的数据让我们猜它是哪个类别，显然应该猜先验概率

最大的那个类别。在观察到数据

后，计算后验概率

P(C1|x)

，

…

，

P(Ck|x)

；于是此时的“先

验”修正为

P’(C1)=P(C1|x)

，

…

，

P’(Ck)=P(Ck|x)

。如果现在再来一个未知类别的数据让我

们猜，我们猜的方法仍旧是找先验概率最大的那个类别，只不过此时的先验概率是

P’(C1)

，

…

，

P’(Ck)

。）

Bayesian和Frequentist的缺点

Bayesian 常受的批评之一：prior distribution is often selected on the basis of mathematical

convenience rather than as a reflection of any prior beliefs。例如常选择 conjugate prior。

Frequentist 方法的缺点：Over-fitting problem can be understood as a general property of

maximum likelihood。

应对over-fitting问题

Frequentist 控制 over-fitting 的方法：

1) regularization，即在目标函数中加入一个 penalty term。

L2 regularizer 被称为 ridge regression

L1 regularizer 被称为 Lasso regression

加入 penalty 的方法也叫 shrinkage method，因为它可以 reduce the value of the coefficients.

2) cross-validation，即留出一部分数据做 validation

Cross-validation 也是一种进行 model selection 的方法。利用留出来的 validation data，可以

选择多个所训练 model 中的最好的一个。

Bayesian 控制 over-fitting 的方法：Prior probability

Bayesian方法面临的主要问题：marginalization计算困难

Marginalization lies at the heart of Bayesian methods.

Bayesian methods 的应用长期受制于 marginalization。对于一个 full Bayesian procedure 来

说，要 make prediction 或 compare different models，必要的一步是 marginalize (sum or integrate)

over the whole of parameter space.

两方面发展起来的方法克服了做 marginalization 的困难：

第一种是 sampling，例如 Markov chain Monte Carlo。Monte Carlo method 的优点是 flexible

而广泛用于各种 model 中；缺点是 computationally intensive，主要用于 small-scale problems.

第二种是 deterministic approximation，例如 variational Bayes 和 expectation propagation，优

点是可用于 large-scale applications.

2) MAP (poor man’s bayes)，引入 prior probability，对 posterior probability 求最大值，得到

w。MAP 此时相当于在 MLE 的目标函数（likelihood function）中加入一个 L2 penalty。该方

Curve fitting为例子演示三种方法

1) MLE，直接对 likelihood function 求最大值，得到参数 w。该方法属于 point estimation。

剩余76页未读，继续阅读

yuanyuan_1012

2013-04-21

总结得很好，有些自己的想法。不过有些点要批判地看。多谢

zhaolu2009

粉丝: 4
资源: 13

会员权益专享

机器学习笔记 Bishop版PAML

评论5

会员权益专享

最新资源

机器学习笔记 Bishop版PAML

评论5

模式识别与机器学习--Bishop

深度学习难得的深入浅出的教材（李宏毅 老师的ppt）

PAML中文文档/计算分子进化

chi2test:卡方检验。-matlab开发

MASTERCAM9.1-自动程序单.doc

node-v16.12.0-darwin-x64.tar.xz

试用Dev Containers的示例项目-Go

NTsky新闻发布v1.0测试版(提供JavaBean).zip

JavaScript介绍.zip

15-21.php

汽车租赁系统（毕业设计）

设计模式_行为型_访问者模式.md

HTML25-创意网站产品主页模板官网落地页APP主页产品宣传页源码 landing静态页面.zip

快手弹幕采集学习源码！！

general-exporter windows

数据可视化大屏展示系统HTML模板源码 大数据大屏展示源码 VUE.zip

node-v18.2.0-linux-armv7l.tar.xz

这个项目是用于个人参加浙江大学移动创新竞赛而使用。.zip

2023年全国职业院校技能大赛“区块链技术应用赛项”国赛正式赛题

基于stm32的智能家居系统

会员权益专享

最新资源

深度学习难得的深入浅出的教材（李宏毅老师的ppt）

数据可视化大屏展示系统HTML模板源码大数据大屏展示源码 VUE.zip