构建K-Means++类
时间: 2023-06-16 13:02:55 浏览: 94
以下是一个基于 Python 的 K-Means++ 类的示例代码:
```python
import numpy as np
class KMeansPP:
def __init__(self, k):
self.k = k
def fit(self, X):
n_samples, n_features = X.shape
# Initialize centroids list and add the first random centroid
centroids = [X[np.random.randint(n_samples)]]
# Add remaining k-1 centroids
for i in range(1, self.k):
# Initialize distances list
distances = []
for j in range(n_samples):
# Compute distance between each sample and the nearest centroid
min_dist = np.inf
for centroid in centroids:
dist = np.linalg.norm(X[j] - centroid)
min_dist = min(min_dist, dist)
distances.append(min_dist)
# Add a new centroid randomly based on the distance weights
distances_sum = sum(distances)
distances = [dist / distances_sum for dist in distances]
new_centroid = X[np.random.choice(n_samples, p=distances)]
centroids.append(new_centroid)
self.centroids = centroids
# Perform K-Means clustering
clusters = [[] for _ in range(self.k)]
for sample in X:
distances = [np.linalg.norm(sample - centroid) for centroid in self.centroids]
closest_centroid_idx = np.argmin(distances)
clusters[closest_centroid_idx].append(sample)
self.clusters = clusters
```
这个类的主要功能是进行 K-Means++ 聚类,其中 `k` 是簇的数量,`X` 是形状为 `(n_samples, n_features)` 的输入数据。在 `fit` 方法中,它首先从输入数据中选择一个随机初始中心点,然后根据每个样本到最近中心点的距离选择下一个中心点,并重复此过程直到选择 `k` 个中心点。然后,它使用这些中心点对输入数据进行 K-Means 聚类,并将每个样本分配到最近的中心点的簇中。 `centroids` 和 `clusters` 属性分别存储所选的中心点和生成的簇。
阅读全文