层次聚类Single Link (MIN) python代码
时间: 2023-07-23 07:07:10 浏览: 100
层次聚类代码
下面是一个基于Python的单链接(Single Link)层次聚类算法的示例代码:
```python
import numpy as np
def single_linkage(X, k):
"""
Single Linkage Clustering
X: ndarray, shape(n_samples, n_features)
k: int, the number of clusters
"""
n_samples = X.shape[0]
# initialize the distance matrix
dist_matrix = np.zeros((n_samples, n_samples))
for i in range(n_samples):
for j in range(i+1, n_samples):
dist_matrix[i, j] = np.linalg.norm(X[i] - X[j])
dist_matrix[j, i] = dist_matrix[i, j]
# initialize the clusters
clusters = [[i] for i in range(n_samples)]
# merge clusters iteratively
while len(clusters) > k:
# find the closest pair of clusters
min_dist = np.inf
for i in range(len(clusters)):
for j in range(i+1, len(clusters)):
dist = np.min(dist_matrix[np.ix_(clusters[i], clusters[j])])
if dist < min_dist:
min_dist = dist
merge_pair = (i, j)
# merge the two closest clusters
new_cluster = clusters[merge_pair[0]] + clusters[merge_pair[1]]
clusters.pop(merge_pair[1])
clusters[merge_pair[0]] = new_cluster
# update the distance matrix
for i in range(len(clusters)-1):
dist_matrix[i, merge_pair[0]] = np.min(dist_matrix[np.ix_(clusters[i], new_cluster)])
dist_matrix[merge_pair[0], i] = dist_matrix[i, merge_pair[0]]
dist_matrix[merge_pair[0], merge_pair[0]] = np.inf
return clusters
```
其中,输入参数`X`是数据集,`k`是期望的聚类数。`dist_matrix`是距离矩阵,用来记录每个样本之间的距离。`clusters`是一个列表,每个元素是一个聚类簇的索引列表。在每次迭代中,找到距离最近的一对聚类簇,将它们合并成一个新的聚类簇,并更新距离矩阵。最终返回聚类结果。
阅读全文