DIANA鸢尾花聚类可视化python代码
时间: 2023-07-22 10:21:08 浏览: 185
以下是一个使用DIANA聚类算法对鸢尾花数据集进行聚类,并可视化聚类结果的Python代码示例:
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.spatial.distance import pdist, squareform
from sklearn.datasets import load_iris
def diana(X, k):
# 计算初始距离矩阵
D = squareform(pdist(X))
n = len(X)
# 开始聚类
clusters = [[i] for i in range(n)]
while len(clusters) < k:
max_d = 0
to_merge = None
# 找到距离最远的簇对
for i in range(len(clusters)):
for j in range(i+1, len(clusters)):
d = 0
for m in clusters[i]:
for n in clusters[j]:
d += D[m][n]
d /= len(clusters[i]) * len(clusters[j])
if d > max_d:
max_d = d
to_merge = (i, j)
# 合并簇
i, j = to_merge
clusters[i] = clusters[i] + clusters[j]
del clusters[j]
# 返回簇的索引
labels = np.zeros(n)
for i, c in enumerate(clusters):
for j in c:
labels[j] = i
return labels
# 加载鸢尾花数据集
iris = load_iris()
X = iris.data
# 使用DIANA聚类
labels = diana(X, 3)
# 可视化聚类结果
df = pd.DataFrame(X, columns=['sepal_length', 'sepal_width', 'petal_length', 'petal_width'])
df['label'] = labels
colors = plt.cm.Spectral(np.linspace(0, 1, len(df['label'].unique())))
for i, color in zip(df['label'].unique(), colors):
plt.scatter(df[df['label'] == i]['sepal_length'], df[df['label'] == i]['sepal_width'], color=color)
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.show()
```
这个代码将加载鸢尾花数据集,使用DIANA聚类算法将其分为3个簇,并将聚类结果可视化。可以看到,不同的簇被不同的颜色表示出来。其中,横轴表示花萼长度,纵轴表示花萼宽度。
阅读全文