MIT讲义：GMM与EM算法详解——高斯混合模型入门

需积分: 10 76 浏览量更新于2024-09-09 收藏 365KB PDF 举报

在本资源中，MIT讲义深入探讨了混合高斯模型（Gaussian Mixture Models, GMM）和 Expectation-Maximization (EM) 算法。这些笔记旨在为对基础概率和微积分有所了解的学习者提供一个清晰的入门指导。GMM 是一种统计建模方法，它假设数据由多个独立或相关的高斯分布组成，常用于数据聚类、密度估计和异常检测等领域。首先，我们回顾高斯分布的基本概念。如果随机变量 X 遵循高斯分布，其概率密度函数 (PDF) 可以表示为： \[ p_X(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \] 其中，$\mu$ 是均值，$\sigma$ 是标准差，而 $e$ 是自然对数的底数。高斯分布因其对称性和钟形特性而在众多概率分布中占据核心地位。接下来，Gaussian Mixture Models（GMM）将这种单一的高斯分布扩展到混合情况，假设数据是由多个不同参数（如均值和协方差矩阵）的高斯分布线性组合而成。在GMM中，每个数据点可能来源于多个高斯分布之一，且各个分布的权重决定了其归属的概率。 EM算法的核心在于解决GMM中的参数估计问题。当观测数据的类别信息不完全时，即存在“隐藏”变量，EM算法通过迭代的方式进行参数估计。E步（Expectation Step）计算当前模型下每个数据点属于每个高斯分量的概率，M步（Maximization Step）则基于上一步的估计更新每个高斯分量的参数（如均值、协方差矩阵和权重）。这两个步骤交替进行，直到模型收敛或达到预设的迭代次数。在整个过程中，KL散度（Kullback-Leibler Divergence）和熵（Entropy）的概念被用作评估模型拟合度和信息增益的工具。KL散度衡量了两个概率分布之间的差异，而熵则反映了随机变量的不确定性。在GMM的上下文中，这些概念有助于优化模型参数，使其更好地适应数据分布。这个资源提供了一个从基础理论到实际应用的全面视角，对于想要理解混合高斯模型和EM算法的读者来说，无论是初学者还是进阶者，都能从中获益匪浅。学习者可以借此加深对概率统计、数据建模以及优化方法的理解，提升在实际数据分析项目中的技能。

5 10 15 20 25

Price (dollars)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

(a) Probability density for paperback books (red),

hardback books (blue), and all books (black, solid)

55 60 65 70 75 80

Height (inches)

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

(b) Probability density for heights of women (red),

heights of men (blue), and all heights (black, solid)

Figure 1: Two Gaussian mixture models: the component densities (which are Gaussian) are

shown in dotted red and blue lines, while the overall density (which is not) is shown as a

solid black line.

the data within each group is normally distributed. Let’s look at this a little more formally

with heights.

2.2 The model

Formally, suppose we have people numbered i = 1, . . . , n. We observe random variable

∈ R for each person’s height, and assume there’s an unobserved label C

∈ {M, F } for

each person representing that person’s gender

. Here, the letter c stands for “class”. In

general, we can have any number of possible labels or classes, but we’ll limit ourselves to two

for this example. We’ll also assume that the two groups have the same known variance σ

but diﬀerent unknown means µ

and µ

. The distribution for the class labels is Bernoulli:

) = q

=M)

(1 − q)

=F )

We’ll also assume q is known. To simplify notation later, we’ll let π

= q and π

= 1 − q,

so we can write

) =

c∈{M,F }

=c)

(1)

The conditional distributions within each class are Gaussian:

) =

N(y

; µ

, σ

)

=c)

(2)

Naive Bayes model, this is somewhat similar. However, here our features are always Gaussian, and in

the general case of more than 1 dimension, we won’t assume independence of the features.

剩余10页未读，继续阅读

broT

粉丝: 0
资源: 5

MIT讲义：GMM与EM算法详解——高斯混合模型入门

MIT公开课——算法导论教材

MIT算法导论课件

MIT-ocw-随机算法

MIT-6.006：MIT 6.006：算法简介

MIT(麻省理工)算法导论笔记

显著性算法评估模型MIT

lrucacheleetcode-mit6.046:MIT6.046：算法设计代码实现介绍

MIT.Introduction to Algorithms 算法导论，英文版

MIT 6.046J 算法设计和分析讲义.pdf

MIT算法导论MIT

最新资源