采用聚类纯度(purity,准确率)进行聚类分析算法评价代码
时间: 2024-05-07 12:17:48 浏览: 127
以下是一个简单的聚类分析算法评价代码,采用聚类纯度(purity,准确率)进行评估:
```python
import numpy as np
from collections import Counter
def purity_score(y_true, y_pred):
"""
计算聚类纯度,即准确率
y_true : array-like, shape = [n_samples]
Ground truth labels
y_pred : array-like, shape = [n_samples]
Predicted labels
"""
contingency_matrix = np.zeros((len(np.unique(y_true)), len(np.unique(y_pred))))
for i in range(len(y_true)):
contingency_matrix[y_true[i], y_pred[i]] += 1
row_ind = np.argmax(contingency_matrix, axis=1)
col_ind = np.argmax(contingency_matrix, axis=0)
matches = sum([contingency_matrix[row_ind[i], col_ind[i]] for i in range(len(row_ind))])
return matches / sum(contingency_matrix.flatten())
# 示例数据
y_true = [0, 0, 0, 1, 1, 1, 2, 2, 2]
y_pred = [0, 0, 1, 1, 1, 2, 2, 2, 2]
# 计算聚类纯度
purity = purity_score(y_true, y_pred)
print("Purity Score: %.3f" % purity)
```
其中,`purity_score` 函数计算聚类纯度,即准确率,输入参数为真实标签 `y_true` 和预测标签 `y_pred`,输出聚类纯度。代码中使用了 `numpy` 和 `collections` 模块,分别用于创建矩阵和计数。在示例数据中,真实标签为 `[0, 0, 0, 1, 1, 1, 2, 2, 2]`,预测标签为 `[0, 0, 1, 1, 1, 2, 2, 2, 2]`,输出聚类纯度为 `0.778`。
阅读全文