最短距离的层次聚类算法python实现
时间: 2023-06-29 13:13:09 浏览: 121
以下是一个基于Python的层次聚类算法的实现:
```python
import numpy as np
def euclidean_distance(x, y):
"""
计算欧几里得距离
"""
return np.sqrt(np.sum((x - y) ** 2))
def hierarchical_clustering(data, method='single'):
"""
层次聚类算法实现
:param data: 二维数据集,每行代表一个样本
:param method: 距离计算方法,包括'single', 'complete', 'average'和'centroid'
:return: 聚类结果,每个元素代表一个聚类,元素为聚类中心的下标
"""
n = data.shape[0]
# 初始化距离矩阵
distances = np.zeros((n, n))
for i in range(n):
for j in range(i+1, n):
distances[i, j] = euclidean_distance(data[i], data[j])
distances[j, i] = distances[i, j]
# 初始化聚类标号
clusters = np.arange(n)
# 合并聚类
for k in range(n-1):
# 找到距离最小的两个聚类
i, j = np.unravel_index(np.argmin(distances), distances.shape)
# 合并聚类
clusters[clusters == clusters[j]] = clusters[i]
# 更新距离矩阵
if method == 'single':
distances[i, :] = np.minimum(distances[i, :], distances[j, :])
distances[:, i] = distances[i, :]
elif method == 'complete':
distances[i, :] = np.maximum(distances[i, :], distances[j, :])
distances[:, i] = distances[i, :]
elif method == 'average':
distances[i, :] = (distances[i, :] + distances[j, :]) / 2
distances[:, i] = distances[i, :]
elif method == 'centroid':
centroid = (data[clusters == i].mean(axis=0) + data[clusters == j].mean(axis=0)) / 2
distances[i, :] = euclidean_distance(centroid, data)
distances[:, i] = distances[i, :]
else:
raise ValueError("Invalid method")
distances[i, i] = np.inf
distances[j, :] = np.inf
distances[:, j] = np.inf
# 返回聚类结果
return np.unique(clusters)
```
可以通过调用函数 `hierarchical_clustering(data, method)` 来实现层次聚类。其中,`data` 是一个二维的数据集,每行代表一个样本;`method` 是距离计算方法,包括'single', 'complete', 'average'和'centroid'。函数返回聚类结果,每个元素代表一个聚类,元素为聚类中心的下标。
阅读全文