请你用python编写K均值聚类的算法
时间: 2024-01-21 14:19:12 浏览: 65
好的,下面是用Python编写的K均值聚类算法:
```python
import random
class KMeans:
def __init__(self, k, tolerance=0.0001, max_iterations=500):
self.k = k
self.tolerance = tolerance
self.max_iterations = max_iterations
def fit(self, data):
self.centroids = {}
for i in range(self.k):
self.centroids[i] = data[random.randint(0, len(data)-1)]
for i in range(self.max_iterations):
self.clusters = {}
for i in range(self.k):
self.clusters[i] = []
for point in data:
distances = [self.distance(point, self.centroids[centroid]) for centroid in self.centroids]
cluster = distances.index(min(distances))
self.clusters[cluster].append(point)
prev_centroids = dict(self.centroids)
for cluster in self.clusters:
self.centroids[cluster] = self.average(self.clusters[cluster])
optimized = True
for c in self.centroids:
original_centroid = prev_centroids[c]
current_centroid = self.centroids[c]
if abs(sum(current_centroid) - sum(original_centroid)) > self.tolerance:
optimized = False
if optimized:
break
def predict(self, data):
distances = [self.distance(data, self.centroids[centroid]) for centroid in self.centroids]
cluster = distances.index(min(distances))
return cluster
def distance(self, a, b):
return sum([(a[i]-b[i])**2 for i in range(len(a))])**0.5
def average(self, points):
return list(map(lambda x: sum(x)/len(x), zip(*points)))
```
这是一个比较简单的K均值聚类算法实现。其中,fit方法用于训练模型,predict方法用于对新数据进行预测,distance方法用于计算两个点之间的距离,average方法用于计算一组点的平均值。
具体实现时,我们首先随机初始化k个聚类中心,然后迭代进行以下步骤:
1. 将每个数据点分配到最近的聚类中心所代表的聚类中;
2. 对于每个聚类,重新计算其聚类中心;
3. 如果聚类中心的移动距离小于设定的tolerance,则停止迭代。
在训练结束后,我们可以使用predict方法来对新数据进行聚类预测。
阅读全文