揭秘多元高斯分布的精华特性：应用与理解

需积分: 10 115 浏览量更新于2024-09-09 收藏 117KB PDF 举报

在斯坦福大学的CS229课程中，"More on Multivariate Gaussians"这一讲义主要探讨了多元高斯分布在各种机器学习应用中的重要性和深入理解。多元高斯分布（Multivariate Gaussian）是一种关键的概率模型，其概率密度函数定义为一个向量随机变量x属于R^n空间，具有均值μ和协方差矩阵Σ。具体来说，如果x的概率密度函数为： p(x; µ, Σ) = 1 / [(2π)^n/2 |Σ|^(1/2)] * exp[{-1/2 (x - µ)ᵀ Σ^(-1) (x - µ)}] 这里的x ~ N(µ, Σ)，表示x服从均值为μ、协方差矩阵为Σ的多维正态分布。课程中特别强调了以下几个关于多元高斯分布的事实： 1. **参数明确性**：如果知道一个多元高斯随机变量的均值μ和协方差矩阵Σ，就可以完全确定其概率分布。这对于许多统计推断和模型建立至关重要。 2. **线性变换保持正态性**：若对x进行线性变换Ax+b，其中A是常数矩阵，b是常数向量，新的随机变量y = Ax + b仍然服从正态分布，即y ~ N(Aμ + b, AΣAᵀ)。这个性质使得多元高斯在处理线性模型时极其方便，如线性回归和主成分分析。 3. **中心极限定理扩展**：当大量独立同分布的小量随机变量加权求和时，即使原始变量不是正态分布，它们的线性组合也趋向于近似正态分布。这在数据集中尤为有用，因为通过平均可以将非正态分布转化为近似的正态分布，从而简化分析。 4. **最大似然估计**：在参数估计中，多元高斯模型的均值和协方差矩阵可以通过最大似然估计法来确定，这种方法在很多机器学习算法中，如混合高斯模型（Mixture of Gaussians）和因子分析中被广泛应用。 5. **条件概率与因子分析**：在因子分析中，多元高斯分布用于解释数据变量之间的依赖关系，通过分解协方差矩阵Σ为因子载荷矩阵和误差项，可以揭示潜在变量对观测变量的影响。理解和掌握这些特性有助于我们理解数据结构，以及在处理复杂数据集时做出有效的决策。本讲义的目标是帮助学生深入了解多元高斯分布背后的数学原理和实际应用，使他们能够在解决作业问题和未来的项目中自信地运用这些概念。通过这些深入的知识，学生可以更好地处理线性模型、概率推断和数据降维等机器学习任务。

1. The ﬁrst thing to point out is that the importance of the independence assumption in

the above rule. To see why this matters, suppose that y ∼ N(µ, Σ) for some mean

vector µ and covariance matrix Σ, and suppose that z = −y. Clearly, z also has a

Gaussian distribution (in fact, z ∼ N(−µ, Σ), but y + z is identically zero!

2. The second thing to point out is a point of confusion for many students: if we add

together two Gaussian densities (“bumps” in multidimensional space), wouldn’t we get

back some bimodal (i.e., “two-humped” density)? Here, the thing to realize is that the

density of the random variable y + z in this rule is NOT found by simply adding the

densities of the individual random variables y and z. Rather, the density of y + z will

actually turn out to be a convolution of the densities for y and z.

To show that the

convolution of two Gaussian densities gives a Gaussian density, however, is beyond the

scope of this class.

Instead, let’s just use the observation that the convolution does give some type of Gaus-

sian density, along with Fact #1, to ﬁgure out what the density, p(y + z|µ, Σ) would be, if

we were to actually compute the convolution. How can we do this? Recall that from Fact

#1, a Gaussian distribution is fully speciﬁed by its mean vector and covariance matrix. If

we can determine what these are, then we’re done.

But this is easy! For the mean, we have

E[y

+ z

] = E[y

] + E[z

] = µ

+ µ

′

from linearity of expectations. Therefore, the mean of y + z is simply µ + µ

′

. Also, the

(i, j)th entry of the covariance matrix is given by

E[(y

+ z

)(y

+ z

)] − E[y

+ z

]E[y

+ z

]

= E[y

+ z

+ y

+ z

] − (E[y

] + E[z

])(E[y

] + E[z

])

= E[y

] + E[z

] + E[y

] + E[z

] − E[y

]E[y

] − E[z

]E[y

] − E[y

]E[z

] − E[z

][z

]

= (E[y

] − E[y

]E[y

]) + (E[z

] − E[z

]E[z

])

+ (E[z

] − E[z

]E[y

]) + (E[y

] − E[y

]E[z

]).

Using the fact that y and z are independent, we have E[z

] = E[z

]E[y

] and E[y

] =

E[y

]E[z

]. Therefore, the last two terms drop out, and we are left with,

E[(y

+ z

)(y

+ z

)] − E[y

+ z

]E[y

+ z

]

= (E[y

] − E[y

]E[y

]) + (E[z

] − E[z

]E[z

])

= Σ

+ Σ

′

For example, if y and z were univariate Gaussians (i.e., y ∼ N(µ, σ

), z ∼ N(µ

′

, σ

′

)), then the

convolution of their probability densities is given by

p(y + z; µ, µ

′

, σ

′

) =

∞

−∞

p(w; µ, σ

)p(y + z − w; µ

′

, σ

′

)dw

∞

−∞

√

2πσ

exp



−

2σ

(w − µ)



√

2πσ

′

exp



−

2σ

′

(y + z − w − µ

′

)



剩余10页未读，继续阅读

ada_weixin

粉丝: 0
资源: 1

揭秘多元高斯分布的精华特性：应用与理解

Brain Network Analysis A Review on Multivariate Analytical Metho

Multivariate Data Visualization with R

[Theory] Notes on Multivariate Linear Regression and Gauss-Marko

npr.multivariate_normal

what if Multivariate Polynomial Ring has been created

multivariate_normal的用法

multivariate vmd

torch.distributions.multivariate_normal

multivariate kernel regression是什么

multivariate与multi—vari studies

最新资源