em算法 python
时间: 2023-10-04 07:06:23 浏览: 86
EM算法(Expectation-Maximization Algorithm)是一种统计学习中常用的迭代算法,用于解决含有隐变量(latent variable)的概率模型参数估计问题。下面是一个用Python实现EM算法的示例:
首先,定义一个含有隐变量的概率模型,例如高斯混合模型:
```python
import numpy as np
import matplotlib.pyplot as plt
class GaussianMixtureModel:
def __init__(self, k):
self.k = k # 高斯分布个数
self.mu = None # 均值
self.sigma = None # 方差
self.alpha = None # 每个高斯分布的权重
def fit(self, X, max_iter=100, tol=1e-3):
n, d = X.shape
self.mu = np.random.rand(self.k, d)
self.sigma = np.random.rand(self.k, d, d)
self.alpha = np.random.rand(self.k)
self.alpha /= self.alpha.sum()
for iter in range(max_iter):
# E-step
gamma = np.zeros((n, self.k))
for j in range(self.k):
gamma[:, j] = self.alpha[j] * self.gaussian_pdf(X, self.mu[j], self.sigma[j])
gamma /= gamma.sum(axis=1, keepdims=True)
# M-step
N = gamma.sum(axis=0)
for j in range(self.k):
self.mu[j] = (gamma[:, j].reshape(-1, 1) * X).sum(axis=0) / N[j]
diff = X - self.mu[j]
self.sigma[j] = np.dot(gamma[:, j] * diff.T, diff) / N[j]
self.alpha[j] = N[j] / n
# 计算对数似然函数,判断是否收敛
log_likelihood = 0
for i in range(n):
log_likelihood += np.log(sum(self.alpha[j] * self.gaussian_pdf(X[i], self.mu[j], self.sigma[j]) for j in range(self.k)))
if iter > 0 and abs(log_likelihood - prev_log_likelihood) < tol:
break
prev_log_likelihood = log_likelihood
def predict(self, X):
n, d = X.shape
y_pred = np.zeros(n)
for i in range(n):
y_pred[i] = np.argmax([self.alpha[j] * self.gaussian_pdf(X[i], self.mu[j], self.sigma[j]) for j in range(self.k)])
return y_pred
def gaussian_pdf(self, x, mu, sigma):
d = x.shape[0]
return 1 / np.sqrt((2 * np.pi)**d * np.linalg.det(sigma)) * np.exp(-0.5 * np.dot(np.dot((x - mu).T, np.linalg.inv(sigma)), x - mu))
```
其中,`fit`方法用于估计模型参数,`predict`方法用于预测数据类别,`gaussian_pdf`方法用于计算高斯分布的概率密度函数。
下面是一个使用EM算法进行高斯混合模型参数估计的示例:
```python
# 生成数据
np.random.seed(1)
N = 1000
X = np.concatenate([np.random.randn(N, 2) + np.array([2, 2]),
np.random.randn(N, 2) + np.array([-2, -2]),
np.random.randn(N, 2) + np.array([2, -2]),
np.random.randn(N, 2) + np.array([-2, 2])])
# 训练模型
model = GaussianMixtureModel(k=4)
model.fit(X)
# 可视化结果
y_pred = model.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=y_pred)
plt.show()
```
运行结果如下所示:
![em_algorithm_example.png](https://cdn.jsdelivr.net/gh/0voice/from_csdn/em_algorithm_example.png)
可以看到,EM算法成功地将数据分为四个簇。
阅读全文