分裂聚类对鸢尾花数据集聚类
时间: 2023-08-10 21:03:51 浏览: 109
模式识别作业__ISODATA聚类算法 用MATLAB实现鸢尾花公开数据集
5星 · 资源好评率100%
对于分裂聚类(Divisive clustering),我们需要先定义一个初始聚类,然后递归地将每个聚类分裂成更小的聚类,直到满足某个终止条件为止。这里我们可以使用 K-means 算法作为初始聚类,然后通过距离度量来分裂聚类。
首先,我们导入必要的库:
```python
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from scipy.spatial.distance import cdist
import numpy as np
```
然后,我们加载鸢尾花数据集并获取特征:
```python
iris = load_iris()
X = iris.data
```
接下来,我们使用 K-means 算法作为初始聚类:
```python
kmeans = KMeans(n_clusters=3).fit(X)
```
然后,我们定义一个分裂函数,用于将聚类分裂成更小的聚类。这里我们使用了欧几里得距离度量:
```python
def split_cluster(cluster, X):
# Compute the distance matrix between points and the centroid of the cluster
distances = cdist(X, np.mean(cluster, axis=0).reshape(1, -1))
# Identify the point farthest from the centroid
farthest_idx = np.argmax(distances)
# Split the cluster into two sub-clusters
sub_clusters = np.split(cluster, [farthest_idx], axis=0)
# Compute the centroids of the sub-clusters
centroids = [np.mean(sub_cluster, axis=0) for sub_cluster in sub_clusters]
return sub_clusters, centroids
```
然后,我们定义一个递归函数来分裂聚类,直到满足某个终止条件为止。这里我们使用了层次聚类的轮廓系数来评估聚类效果:
```python
def divisive_clustering(cluster, X, min_size=5, max_depth=10):
if len(cluster) <= min_size or max_depth == 0:
return cluster
sub_clusters, centroids = split_cluster(cluster, X)
# Compute the silhouette score of the sub-clusters
sub_scores = []
for sub_cluster in sub_clusters:
sub_scores.append(silhouette_score(sub_cluster, np.full(len(sub_cluster), len(sub_clusters)), metric='euclidean'))
# Find the sub-cluster with the lowest silhouette score
min_idx = np.argmin(sub_scores)
if sub_scores[min_idx] > silhouette_score(cluster, np.full(len(cluster), 0), metric='euclidean'):
return cluster
left_cluster = divisive_clustering(sub_clusters[min_idx], X, min_size=min_size, max_depth=max_depth-1)
right_cluster = divisive_clustering(np.concatenate([sub_clusters[i] for i in range(len(sub_clusters)) if i != min_idx]), X, min_size=min_size, max_depth=max_depth-1)
return [left_cluster, right_cluster]
```
最后,我们可以使用 `divisive_clustering` 函数对数据进行聚类,并可视化聚类结果:
```python
clusters = divisive_clustering(X, X, min_size=5, max_depth=10)
colors = ['r', 'g', 'b', 'y', 'c', 'm']
color_map = {}
for i, cluster in enumerate(clusters):
for j in range(len(cluster)):
color_map[tuple(cluster[j])] = colors[i]
colors = [color_map[tuple(X[i])] for i in range(len(X))]
fig = plt.figure(1)
ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=48, azim=134)
ax.scatter(X[:, 3], X[:, 0], X[:, 2], c=colors, edgecolor='k')
plt.show()
```
这将生成一个 3D 散点图,其中不同的颜色表示不同的聚类。
阅读全文