实现聚类性能指标DBI,DI, CHI
时间: 2023-09-10 17:13:39 浏览: 207
以下是计算聚类性能指标 Davies-Bouldin Index (DBI),Dunn Index (DI) 和 Calinski-Harabasz Index (CHI) 的 Python 代码示例:
```python
from sklearn.cluster import KMeans
from sklearn.metrics import pairwise_distances
from sklearn.datasets import make_blobs
import numpy as np
# 生成随机数据集
X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.6, random_state=0)
# 计算聚类中心和标签
kmeans = KMeans(n_clusters=4, random_state=0).fit(X)
labels = kmeans.labels_
centers = kmeans.cluster_centers_
# 计算聚类内部的平均距离和聚类间的距离
avg_distances = np.zeros(len(centers))
for i, center in enumerate(centers):
avg_distances[i] = np.mean(pairwise_distances(X[labels == i], [center]))
inter_distances = pairwise_distances(centers)
# 计算 DBI
dbi = 0
for i in range(len(centers)):
max_ratio = -np.inf
for j in range(len(centers)):
if i != j:
ratio = (avg_distances[i] + avg_distances[j]) / inter_distances[i][j]
if ratio > max_ratio:
max_ratio = ratio
dbi += max_ratio
dbi /= len(centers)
print("DBI:", dbi)
# 计算 DI
di = np.inf
for i in range(len(centers)):
for j in range(len(centers)):
if i != j:
ratio = inter_distances[i][j] / max(avg_distances[i], avg_distances[j])
if ratio < di:
di = ratio
print("DI:", di)
# 计算 CHI
chi = np.trace(inter_distances) / np.trace(np.cov(X.T)) * (len(X) - len(centers)) / (len(centers) - 1)
print("CHI:", chi)
```
其中,`make_blobs` 用于生成随机数据集,`KMeans` 用于聚类,`pairwise_distances` 用于计算距离矩阵,最终输出 DBI、DI 和 CHI 的值。
阅读全文