DIANA算法案例含代码
时间: 2023-11-25 12:59:22 浏览: 225
DIANA算法是一种基于距离的层次聚类算法,它的核心思想是将数据集中的每个数据点都视为一个单独的簇,然后不断地合并簇,直到所有的数据点都被合并为一个簇。
下面是一个使用Python实现DIANA算法的示例代码:
```python
import numpy as np
def euclidean_distance(x1, x2):
"""
计算两个向量之间的欧几里得距离
"""
return np.sqrt(np.sum((x1 - x2)**2))
class DIANA:
def __init__(self, k):
self.k = k
def fit(self, X):
n_samples, n_features = X.shape
# 初始化簇列表,每个数据点对应一个簇
clusters = [{"id": i, "centroid": X[i], "elements": [i]} for i in range(n_samples)]
# 不断合并簇,直到只剩下k个簇
while len(clusters) > self.k:
# 计算每个簇之间的距离
distances = np.zeros((len(clusters), len(clusters)))
for i in range(len(clusters)):
for j in range(i+1, len(clusters)):
distances[i][j] = euclidean_distance(clusters[i]["centroid"], clusters[j]["centroid"])
distances[j][i] = distances[i][j]
# 找到距离最近的两个簇
min_dist = np.min(distances[np.nonzero(distances)])
cluster_indexes = np.where(distances == min_dist)
cluster1_index = cluster_indexes[0][0]
cluster2_index = cluster_indexes[1][0]
# 合并两个簇
new_cluster = {"id": len(clusters), "centroid": (clusters[cluster1_index]["centroid"] + clusters[cluster2_index]["centroid"])/2, "elements": clusters[cluster1_index]["elements"] + clusters[cluster2_index]["elements"]}
del clusters[cluster2_index]
del clusters[cluster1_index]
clusters.append(new_cluster)
# 为每个数据点分配簇标签
labels = np.zeros(n_samples)
for i in range(len(clusters)):
for j in clusters[i]["elements"]:
labels[j] = i
self.clusters = clusters
self.labels = labels
```
下面是一个使用DIANA算法对Iris数据集进行聚类的示例:
```python
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
diana = DIANA(k=3)
diana.fit(X)
print(diana.labels)
```
输出结果为:
```
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 2. 2. 2. 2. 2. 2. 2. 2.
2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.
2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.
2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
```
其中,簇标签0表示第一类,簇标签1表示第二类,簇标签2表示第三类。
阅读全文