请利用Python写出以下任务的完整代码:对鸢尾花数据集进行层次聚类,在不同的距离定义下:找到最佳类别数、对结果进行对比分析、选择最贴近数据的距离定义绘制谱系图并输出结果、绘制划分标识和原标识的交叉列表、计算兰德指数
时间: 2024-03-17 18:40:48 浏览: 117
好的,以下是一个示例代码,其中使用了scikit-learn库中的AgglomerativeClustering来进行层次聚类:
```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram
from sklearn.datasets import load_iris
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import adjusted_rand_score
# 导入数据集
iris = load_iris()
X = iris.data
true_labels = iris.target
# 定义不同的距离定义方式
linkages = ['ward', 'complete', 'average', 'single']
# 寻找最佳类别数
best_n_cluster = 0
best_score = -1
for i in range(2, 11):
for linkage in linkages:
cluster_model = AgglomerativeClustering(n_clusters=i, linkage=linkage)
cluster_labels = cluster_model.fit_predict(X)
score = adjusted_rand_score(true_labels, cluster_labels)
if score > best_score:
best_score = score
best_n_cluster = i
best_linkage = linkage
# 在最佳类别数和最贴近数据的距离定义下进行层次聚类
cluster_model = AgglomerativeClustering(n_clusters=best_n_cluster, linkage=best_linkage)
cluster_labels = cluster_model.fit_predict(X)
# 对结果进行对比分析
print("Adjusted Rand score:", adjusted_rand_score(true_labels, cluster_labels))
# 绘制谱系图并输出结果
plt.title('Hierarchical Clustering Dendrogram')
dendrogram(cluster_model, truncate_mode='level', p=3)
plt.xlabel("Number of points in node (or index of point if no parenthesis).")
plt.show()
# 绘制划分标识和原标识的交叉列表
print("Cluster labels:", cluster_labels)
print("True labels: ", true_labels)
cross_table = np.zeros((best_n_cluster, 3))
for i in range(len(X)):
cross_table[cluster_labels[i], true_labels[i]] += 1
print("Cross table:")
print(cross_table)
# 计算兰德指数
print("Adjusted Rand score:", adjusted_rand_score(true_labels, cluster_labels))
```
需要注意的是,由于鸢尾花数据集是一个有标签的数据集,因此这里可以直接使用`iris.target`作为真实标签。
阅读全文