kmeans聚类导入数据,手肘法兼顾类内距离 还要看类间距离 合理选取聚类的个数,python
时间: 2023-05-28 15:02:06 浏览: 158
代码如下:
```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
# 导入数据
data = np.load("data.npy")
# 选择最大聚类数
max_clusters = 10
# 初始化列表存储KMeans模型和误差
kmeans_models = []
inertias = []
# 创建KMeans模型,并计算误差
for i in range(1, max_clusters + 1):
kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)
kmeans.fit(data)
kmeans_models.append(kmeans)
inertias.append(kmeans.inertia_)
# 绘制手肘法图形,寻找最佳聚类个数
plt.plot(range(1, max_clusters+1), inertias, marker='o')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')
plt.title('Elbow Method')
plt.show()
# 计算类间距离和类内距离
silhouette_scores = []
for i in range(len(kmeans_models)):
labels = kmeans_models[i].labels_
centers = kmeans_models[i].cluster_centers_
intra_distances = [np.linalg.norm(data[i]-centers[labels[i]]) for i in range(data.shape[0])]
inter_distances = []
for j in range(len(centers)):
for k in range(j+1, len(centers)):
inter_distances.append(np.linalg.norm(centers[j]-centers[k]))
if len(inter_distances) == 0:
silhouette_scores.append(0)
else:
silhouette_score = np.mean(inter_distances) - np.mean(intra_distances)
silhouette_scores.append(silhouette_score)
# 绘制轮廓系数图,寻找最佳聚类个数
plt.plot(range(1, max_clusters+1), silhouette_scores, marker='o')
plt.xlabel('Number of clusters')
plt.ylabel('Silhouette score')
plt.title('Silhouette Method')
plt.show()
# 根据轮廓系数选择最佳聚类个数
best_index = np.argmax(silhouette_scores)
best_cluster_num = best_index + 1
# 用最佳聚类个数创建KMeans模型,并输出结果
kmeans_best = KMeans(n_clusters=best_cluster_num, init='k-means++', random_state=42)
kmeans_best.fit(data)
labels = kmeans_best.labels_
centers = kmeans_best.cluster_centers_
print("聚类个数:", best_cluster_num)
print("中心点:", centers)
print("标签:", labels)
```
说明:
• 首先,我们导入需要聚类的数据;
• 然后,我们选择最大聚类数,并初始化列表存储KMeans模型和误差;
• 接着,我们循环创建KMeans模型,并计算误差,然后将模型和误差分别存储在列表中;
• 然后,我们使用matplotlib库绘制手肘法图形,来观察误差随聚类数增加而递减的情况,根据图形来选择最佳聚类个数;
• 接下来,我们计算类间距离和类内距离,使用轮廓系数来衡量聚类结果的好坏,绘制轮廓系数图,并根据图形来选择最佳聚类个数;
• 最后,我们用最佳聚类个数创建KMeans模型,并输出聚类结果。
阅读全文