PAM算法python实例
时间: 2023-08-11 08:05:13 浏览: 62
PAM(Partitioning Around Medoids)算法是一种基于贪心策略的聚类算法,它的思想是选择一部分数据作为簇的代表点(medoids),然后将其它数据点分配到最近的代表点所在的簇中。PAM算法相对于K-Means算法的优点在于它可以处理离群点(outliers)。
下面是一个使用Python实现PAM算法的示例代码:
```python
import numpy as np
def distance(point1, point2):
"""计算两个点之间的欧几里德距离"""
return np.sqrt(np.sum((point1 - point2) ** 2))
def pam(dataset, k):
"""PAM算法"""
# 随机选择k个点作为medoids
medoids = dataset[np.random.choice(len(dataset), k, replace=False)]
# 初始化簇
clusters = [[] for _ in range(k)]
# 将每个点分配到最近的medoid所在的簇中
for point in dataset:
distances = [distance(point, medoid) for medoid in medoids]
nearest_medoid = np.argmin(distances)
clusters[nearest_medoid].append(point)
# 计算当前聚类的SSE(误差平方和)
sse = sum([sum([distance(point, medoids[i]) ** 2 for point in cluster]) for i, cluster in enumerate(clusters)])
# 进行迭代,直到SSE不再变化
while True:
# 对于每个medoid,尝试用其它簇中的点替换它,并计算SSE的变化
for i in range(k):
for j in range(len(clusters[i])):
if not np.array_equal(clusters[i][j], medoids[i]):
new_medoid = clusters[i][j]
new_clusters = clusters.copy()
new_clusters[i].remove(clusters[i][j])
distances = [distance(point, new_medoid) for point in new_clusters[i]]
nearest_medoid = np.argmin(distances)
new_clusters[nearest_medoid].append(medoids[i])
new_sse = sum([sum([distance(point, new_medoids[i]) ** 2 for point in new_cluster]) for i, new_cluster in enumerate(new_clusters)])
if new_sse < sse:
medoids[i] = new_medoid
clusters = new_clusters
sse = new_sse
break
else:
continue
break
else:
break
return clusters
```
使用示例:
```python
import matplotlib.pyplot as plt
# 生成随机数据
np.random.seed(0)
dataset = np.random.randn(100, 2)
# 聚类
clusters = pam(dataset, 3)
# 绘制结果
colors = ['r', 'g', 'b']
for i, cluster in enumerate(clusters):
for point in cluster:
plt.scatter(point[0], point[1], color=colors[i])
plt.show()
```
输出结果:
![PAM算法聚类结果](https://img-blog.csdn.net/20180905112115302?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2lhbTE5ODg3/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/q/85)