python实现em算法,并求解最优模型。
时间: 2024-03-06 09:51:00 浏览: 19
好的,下面是一个Python实现EM算法并求解最优模型的示例代码:
```python
import numpy as np
# 初始化参数
def init_params(X, K):
N, D = X.shape
mu = np.random.rand(K, D)
sigma = np.zeros((K, D, D))
for k in range(K):
sigma[k] = np.eye(D)
pi = np.ones(K) / K
return mu, sigma, pi
# 计算高斯分布概率密度函数
def gaussian_pdf(X, mu, sigma):
D = X.shape[1]
X = X - mu
sigma_inv = np.linalg.inv(sigma)
det = np.linalg.det(sigma)
pdf = np.exp(-0.5 * np.sum(X.dot(sigma_inv) * X, axis=1)) / (np.sqrt((2 * np.pi) ** D * det))
return pdf
# E步:计算每个样本属于每个高斯分布的概率
def e_step(X, mu, sigma, pi):
N, K = X.shape[0], mu.shape[0]
pdf = np.zeros((N, K))
for k in range(K):
pdf[:, k] = pi[k] * gaussian_pdf(X, mu[k], sigma[k])
gamma = pdf / np.sum(pdf, axis=1, keepdims=True)
return gamma
# M步:更新模型参数
def m_step(X, gamma):
N, D = X.shape
K = gamma.shape[1]
mu = np.zeros((K, D))
sigma = np.zeros((K, D, D))
pi = np.zeros(K)
for k in range(K):
Nk = np.sum(gamma[:, k])
mu[k] = np.sum(gamma[:, k].reshape(-1, 1) * X, axis=0) / Nk
X_centered = X - mu[k]
sigma[k] = X_centered.T.dot(np.diag(gamma[:, k])).dot(X_centered) / Nk
pi[k] = Nk / N
return mu, sigma, pi
# 计算对数似然函数值
def log_likelihood(X, mu, sigma, pi):
N, K = X.shape[0], mu.shape[0]
pdf = np.zeros((N, K))
for k in range(K):
pdf[:, k] = pi[k] * gaussian_pdf(X, mu[k], sigma[k])
ll = np.sum(np.log(np.sum(pdf, axis=1)))
return ll
# EM算法
def em_algorithm(X, K, max_iters=100, tol=1e-6):
mu, sigma, pi = init_params(X, K)
ll = log_likelihood(X, mu, sigma, pi)
ll_history = [ll]
for i in range(max_iters):
gamma = e_step(X, mu, sigma, pi)
mu, sigma, pi = m_step(X, gamma)
ll_new = log_likelihood(X, mu, sigma, pi)
ll_history.append(ll_new)
if np.abs(ll_new - ll) < tol:
break
ll = ll_new
return mu, sigma, pi, ll_history[-1]
# 测试EM算法
X = np.random.rand(100, 2)
K = 3
mu, sigma, pi, ll = em_algorithm(X, K)
print("mu:")
print(mu)
print("sigma:")
print(sigma)
print("pi:")
print(pi)
print("log likelihood:")
print(ll)
```
在上面的代码中,`init_params`函数用于初始化EM算法的参数,其中`mu`表示每个高斯分布的均值,`sigma`表示每个高斯分布的协方差矩阵,`pi`表示每个高斯分布的权重。`gaussian_pdf`函数用于计算高斯分布的概率密度函数,`e_step`函数用于计算每个样本属于每个高斯分布的概率,`m_step`函数用于更新模型参数,`log_likelihood`函数用于计算对数似然函数值,`em_algorithm`函数是整个EM算法的实现。
在测试代码中,我们生成了一个100x2的随机矩阵作为样本数据,设定了高斯分布的数量为3,然后调用`em_algorithm`函数求解最优模型。最后输出了每个高斯分布的均值、协方差矩阵、权重以及对数似然函数值。