用Python写kmeans
时间: 2023-11-19 09:11:08 浏览: 27
以下是Python中k-means算法的实现代码:
```python
import numpy as np
class KMeans:
def __init__(self, k=3, max_iter=100, tol=1e-4):
self.k = k
self.max_iter = max_iter
self.tol = tol
def fit(self, X):
n_samples = X.shape[0]
self.centroids = X[np.random.choice(n_samples, self.k, replace=False)]
for i in range(self.max_iter):
clusters = [[] for _ in range(self.k)]
for sample in X:
distances = [np.linalg.norm(sample - centroid) for centroid in self.centroids]
closest_cluster = np.argmin(distances)
clusters[closest_cluster].append(sample)
prev_centroids = self.centroids.copy()
for idx, cluster in enumerate(clusters):
if len(cluster) > 0:
self.centroids[idx] = np.mean(cluster, axis=0)
diff = np.sum(np.abs(self.centroids - prev_centroids))
if diff < self.tol:
break
def predict(self, X):
distances = [np.linalg.norm(X - centroid, axis=1) for centroid in self.centroids]
closest_cluster = np.argmin(distances, axis=0)
return closest_cluster
```
该代码实现了一个KMeans类,可以用来训练和预测数据。在初始化时,可以指定要聚类的数量(k)、最大迭代次数(max_iter)和收敛阈值(tol)。在训练时,首先随机选择k个样本作为初始的质心,然后进行迭代,直到质心不再变化或达到最大迭代次数为止。每次迭代时,将每个样本分配到最近的质心所在的簇中,并更新每个簇的质心。在预测时,给定新数据点,将其分配到最近的质心所在的簇中,并返回簇的索引。