python中kmeans_kmeans与kmeans++的python实现
时间: 2023-08-27 22:06:44 浏览: 115
K-means是一种常用的聚类算法,而K-means++是K-means算法的优化版本,它能够更好地初始化聚类中心,从而得到更好的聚类效果。下面是Python中K-means和K-means++的实现方法。
K-means实现:
```python
import numpy as np
def kmeans(X, k, max_iter=100):
n_samples, n_features = X.shape
centroids = X[np.random.choice(n_samples, k, replace=False)]
for i in range(max_iter):
clusters = [[] for _ in range(k)]
for idx, x in enumerate(X):
distances = [np.linalg.norm(x - c) for c in centroids]
clusters[np.argmin(distances)].append(idx)
new_centroids = np.zeros((k, n_features))
for idx, cluster in enumerate(clusters):
new_centroids[idx] = np.mean(X[cluster], axis=0)
if np.allclose(new_centroids, centroids):
break
centroids = new_centroids
return centroids, clusters
```
K-means++实现:
```python
import numpy as np
def kmeans_pp(X, k, max_iter=100):
n_samples, n_features = X.shape
centroids = []
# choose first centroid randomly
idx = np.random.choice(n_samples, 1, replace=False)
centroids.append(X[idx])
# choose the rest of the centroids using k-means++ algorithm
for i in range(1, k):
distances = np.zeros(n_samples)
for j, x in enumerate(X):
distances[j] = np.min([np.linalg.norm(x - c) for c in centroids])
probabilities = distances / np.sum(distances)
cumulative_probabilities = np.cumsum(probabilities)
idx = np.searchsorted(cumulative_probabilities, np.random.rand())
centroids.append(X[idx])
centroids = np.array(centroids)
# run k-means algorithm with the initial centroids
for i in range(max_iter):
clusters = [[] for _ in range(k)]
for idx, x in enumerate(X):
distances = [np.linalg.norm(x - c) for c in centroids]
clusters[np.argmin(distances)].append(idx)
new_centroids = np.zeros((k, n_features))
for idx, cluster in enumerate(clusters):
new_centroids[idx] = np.mean(X[cluster], axis=0)
if np.allclose(new_centroids, centroids):
break
centroids = new_centroids
return centroids, clusters
```
这两个函数的输入参数相同,其中X是数据集,k是聚类数量,max_iter是最大迭代次数。函数返回聚类中心和每个数据点所属的聚类编号。
阅读全文