用python写一个自实现的kmeans++算法
时间: 2024-03-23 19:41:26 浏览: 48
好的,以下是用Python实现的kmeans++算法:
```python
import numpy as np
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
class KMeanspp:
def __init__(self, k=3, max_iter=100):
self.k = k
self.max_iter = max_iter
def fit(self, X):
n_samples, n_features = X.shape
centers = np.empty((self.k, n_features))
# Step 1: Randomly select the first center
random_index = np.random.randint(n_samples)
centers[0] = X[random_index]
# Step 2: Compute the distance between each point and the nearest center
distances = np.zeros((n_samples, self.k))
for i in range(1, self.k):
for j in range(n_samples):
distances[j, i-1] = np.min([np.linalg.norm(X[j] - centers[k]) for k in range(i)])
# Step 3: Select the new center using a weighted probability distribution
prob = np.sum(distances[:, :i], axis=1)
prob /= prob.sum()
random_index = np.random.choice(n_samples, p=prob)
centers[i] = X[random_index]
# Step 4: Run standard k-means algorithm
labels = np.zeros(n_samples)
for _ in range(self.max_iter):
for i in range(n_samples):
distances[i] = [np.linalg.norm(X[i] - centers[k]) for k in range(self.k)]
labels = np.argmin(distances, axis=1)
for i in range(self.k):
centers[i] = X[labels == i].mean(axis=0)
self.labels_ = labels
self.cluster_centers_ = centers
return self
# Example usage
X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
kmeans = KMeanspp(k=4, max_iter=300)
kmeans.fit(X)
# Plot the results
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=200, marker='*', c='red')
plt.show()
```
以上代码实现了一个KMeanspp类,其构造函数接受两个参数:k表示簇的数量,max_iter表示最大迭代次数。fit方法接受一个数据集X并执行k-means++算法。在该方法中,首先通过随机选择一个点作为第一个簇中心,然后依次选择剩余的簇中心,每次选择时根据距离加权概率分布进行选择。最后,运行标准的k-means算法,直到达到最大迭代次数或收敛为止。最终,返回簇标签和簇中心。最后,我们使用make_blobs函数生成一个模拟数据集,并将其用于演示。