kmeans聚类算法python代码
时间: 2023-08-28 12:07:52 浏览: 71
下面是一个基本的k-means聚类算法的Python代码实现。假设我们有一个包含n个数据点的数据集,每个数据点有d个特征。我们将数据集分成k个簇,然后根据数据点的相似性将每个数据点分配到一个簇中。
```python
import numpy as np
class KMeans:
def __init__(self, k=2, tol=0.001, max_iter=300):
self.k = k
self.tol = tol
self.max_iter = max_iter
def fit(self, X):
self.centroids = {}
# Initialize centroids
for i in range(self.k):
self.centroids[i] = X[i]
# Run k-means algorithm
for i in range(self.max_iter):
self.clusters = {}
# Initialize clusters
for j in range(self.k):
self.clusters[j] = []
# Assign points to clusters
for x in X:
distances = [np.linalg.norm(x - self.centroids[c]) for c in self.centroids]
cluster = distances.index(min(distances))
self.clusters[cluster].append(x)
# Update centroids
prev_centroids = dict(self.centroids)
for cluster in self.clusters:
self.centroids[cluster] = np.average(self.clusters[cluster], axis=0)
# Check convergence
is_converged = True
for c in self.centroids:
original_centroid = prev_centroids[c]
current_centroid = self.centroids[c]
if np.sum((current_centroid - original_centroid) / original_centroid * 100.0) > self.tol:
is_converged = False
if is_converged:
break
def predict(self, X):
distances = [np.linalg.norm(X - self.centroids[c]) for c in self.centroids]
cluster = distances.index(min(distances))
return cluster
```
这个实现中,我们首先初始化k个簇的质心,然后根据每个数据点与质心的距离将它们分配到不同的簇中。接着,我们更新每个簇的质心,重新分配数据点,并检查算法是否收敛。最后,我们可以使用`predict()`方法将新的数据点分配到一个簇中。
阅读全文