Kaggle糖尿病数据集进行聚类分析,Kmeans、肘部法则、间隔轮廓法、平均轮廓法,K值的可视化,将结果可视化,将聚类结果可视化python代码
时间: 2024-03-26 17:35:19 浏览: 141
Python——K-means聚类分析及其结果可视化
好的,以下是对 Kaggle 糖尿病数据集进行 KMeans 聚类分析的 Python 代码:
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from yellowbrick.cluster import KElbowVisualizer, SilhouetteVisualizer
# 读取数据集
data = pd.read_csv('diabetes.csv')
# 特征选择
X = data.iloc[:, :8]
# 肘部法则
model = KMeans(random_state=42)
visualizer = KElbowVisualizer(model, k=(2, 10), metric='distortion')
visualizer.fit(X)
visualizer.show()
# 间隔轮廓法
model = KMeans(random_state=42)
visualizer = SilhouetteVisualizer(model, k=(2, 10))
visualizer.fit(X)
visualizer.show()
# 平均轮廓法
scores = []
for k in range(2, 11):
model = KMeans(n_clusters=k, random_state=42)
score = silhouette_score(X, model.fit_predict(X))
scores.append(score)
plt.plot(range(2, 11), scores)
plt.xlabel('Number of Clusters')
plt.ylabel('Silhouette Score')
plt.show()
# KMeans 聚类
model = KMeans(n_clusters=3, random_state=42)
y_pred = model.fit_predict(X)
# 可视化聚类结果
plt.scatter(X.iloc[:, 0], X.iloc[:, 1], c=y_pred)
plt.xlabel('Glucose')
plt.ylabel('BloodPressure')
plt.show()
```
其中,肘部法则使用了 `KElbowVisualizer`,间隔轮廓法使用了 `SilhouetteVisualizer`,平均轮廓法手动计算了轮廓系数并绘制了可视化图表。最后使用 KMeans 聚类算法将数据集聚为 3 类,并将聚类结果可视化。
阅读全文