python手工实现凝聚式层次聚类
时间: 2023-06-29 19:14:51 浏览: 119
凝聚式层次聚类是一种自底向上的聚类方法,它从每个数据点作为一个簇开始,然后将最相似的簇合并,直到达到预定的聚类数目为止。下面是一个简单的Python实现示例:
```python
import numpy as np
class AgglomerativeClustering:
def __init__(self, n_clusters=2, linkage='ward'):
self.n_clusters = n_clusters
self.linkage = linkage
def fit(self, X):
self.n_samples, self.n_features = X.shape
self.labels_ = np.arange(self.n_samples)
self.n_labels_ = self.n_samples
# 计算距离矩阵
self.distances_ = np.zeros((self.n_samples - 1, 3))
self.distances_[:, 0] = np.arange(0, self.n_samples - 1)
self.distances_[:, 1] = np.arange(1, self.n_samples)
self.distances_[:, 2] = self._compute_distances(X)
# 合并簇
while self.n_labels_ > self.n_clusters:
# 找到距离最小的簇
min_distance_idx = np.argmin(self.distances_[:, 2])
cluster_i, cluster_j = int(self.distances_[min_distance_idx, 0]), int(self.distances_[min_distance_idx, 1])
# 更新标签
for idx in np.where(self.labels_ == cluster_j)[0]:
self.labels_[idx] = cluster_i
# 合并簇
if self.linkage == 'ward':
new_cluster = self._ward_linkage(X, cluster_i, cluster_j)
elif self.linkage == 'single':
new_cluster = self._single_linkage(X, cluster_i, cluster_j)
elif self.linkage == 'complete':
new_cluster = self._complete_linkage(X, cluster_i, cluster_j)
else:
raise ValueError('Linkage type must be one of ["ward", "single", "complete"]')
# 更新距离矩阵
self.distances_ = np.delete(self.distances_, min_distance_idx, axis=0)
new_distances = self._compute_distances(X[new_cluster])
new_distances = np.concatenate([new_distances, np.zeros((1,))], axis=0)
self.distances_ = np.vstack([self.distances_, np.hstack([np.full((1,), fill_value=cluster_i), np.full((1,), fill_value=cluster_j), new_distances])])
self.n_labels_ -= 1
def _compute_distances(self, X):
distances = []
for i in range(self.n_samples - 1):
for j in range(i + 1, self.n_samples):
distances.append(np.linalg.norm(X[i] - X[j]))
return np.array(distances)
def _ward_linkage(self, X, cluster_i, cluster_j):
new_cluster = np.append(cluster_i, cluster_j)
return new_cluster
def _single_linkage(self, X, cluster_i, cluster_j):
new_cluster = np.append(cluster_i, cluster_j)
return new_cluster
def _complete_linkage(self, X, cluster_i, cluster_j):
new_cluster = np.append(cluster_i, cluster_j)
return new_cluster
```
对于给定的数据集X,可以通过实例化AgglomerativeClustering类并调用fit方法进行聚类操作,例如:
```python
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
X, y = make_blobs(n_samples=100, centers=3, n_features=2, random_state=42)
model = AgglomerativeClustering(n_clusters=3, linkage='ward')
model.fit(X)
plt.scatter(X[:, 0], X[:, 1], c=model.labels_)
plt.show()
```
这里使用make_blobs函数生成了一个包含3个中心的、具有2个特征的随机数据集,并将其可视化。聚类的结果可以通过不同颜色的点来表示。需要注意的是,这里的代码仅提供了一种简单的实现方式,实际应用中可能需要根据具体需求进行修改和优化。
阅读全文