分别使用K-means和K-medoids算法对sklearn中的鸢尾花数据集进行聚类(聚类数目设置为3),并评估这两种聚类算法的准确率。请给出完整的python程序
时间: 2024-12-28 19:39:23 浏览: 3
首先,让我们导入所需的库,包括scikit-learn(sklearn)用于加载鸢尾花数据集,numpy处理数组操作,以及metrics计算聚类准确率:
```python
import numpy as np
from sklearn import datasets
from sklearn.cluster import KMeans, MiniBatchKMeans
from sklearn.metrics import accuracy_score
from scipy.spatial.distance import cdist
# 加载鸢尾花数据集
iris = datasets.load_iris()
X = iris.data
y_true = iris.target
# 设置k值为3
k = 3
```
接下来,我们将分别使用K-means和K-medoids算法进行聚类:
### K-means
```python
kmeans = KMeans(n_clusters=k)
kmeans.fit(X)
# 预测聚类结果
y_pred_kmeans = kmeans.predict(X)
# 计算准确率
accuracy_kmeans = accuracy_score(y_true, y_pred_kmeans)
print("K-means accuracy:", accuracy_kmeans)
```
### K-medoids
K-medoids算法通常需要scipy库的`Ward`距离,但是sklearn库本身并没有提供,我们可以用MiniBatchKMeans代替,因为它更接近K-medoids的性质(使用样本中心作为代表点):
```python
def ward_distances(X):
dists = cdist(X, X, metric='euclidean')
D = np.zeros((len(X), len(X)))
for i in range(len(X)):
D[i, i] = dists[i].sum() / (len(dists[i]) - 1) # 考虑到每个样本自身到自身的距离
return D
medoids = MiniBatchKMeans(n_clusters=k, init="random", n_init=10, max_no_improvement=10,
compute_labels=False).fit(X)
medoids_indices = medoids.cluster_centers_indices_
y_pred_medoids = np.argmin(ward_distances(X[:, np.newaxis] - X[medoids_indices, :]), axis=1)
# 计算准确率
accuracy_medoids = accuracy_score(y_true, y_pred_medoids)
print("K-medoids accuracy:", accuracy_medoids)
```
现在我们有了K-means和K-medoids聚类后的准确率。如果想深入了解算法的细节,可以查阅相关资料。
阅读全文