使用python手动实现k-means聚类
时间: 2023-11-15 14:06:50 浏览: 154
python实现k-means聚类
3星 · 编辑精心推荐
首先,我们需要生成一些随机数据来进行聚类。这里我们生成了100个二维数据点,分为两个簇。
```python
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
# 生成随机数据
X = np.random.randn(100, 2)
X[:50, 0] += 5
X[:50, 1] -= 5
# 绘制数据散点图
plt.scatter(X[:, 0], X[:, 1])
plt.show()
```
![image-20211108142335908](https://i.loli.net/2021/11/08/2lWq3wFJrDmOZKj.png)
接下来,我们可以手动实现k-means算法。首先,我们需要随机初始化k个聚类中心。
```python
def init_centroids(X, k):
"""
随机初始化k个聚类中心
"""
centroids = X.copy()
np.random.shuffle(centroids)
return centroids[:k]
k = 2
centroids = init_centroids(X, k)
print("初始聚类中心:")
print(centroids)
```
输出:
```
初始聚类中心:
[[ 3.27057996 -0.43435128]
[ 4.61238434 -6.41708773]]
```
然后,我们可以迭代进行聚类。在每一次迭代中,我们需要将每个数据点分配到最近的聚类中心,并更新聚类中心的位置。
```python
def assign_clusters(X, centroids):
"""
将每个数据点分配到最近的聚类中心
"""
distances = np.sqrt(((X - centroids[:, np.newaxis])**2).sum(axis=2))
return np.argmin(distances, axis=0)
def update_centroids(X, clusters, k):
"""
更新聚类中心的位置
"""
centroids = np.zeros((k, X.shape[1]))
for i in range(k):
centroids[i] = X[clusters == i].mean(axis=0)
return centroids
def k_means(X, k, max_iter=100):
"""
k-means聚类算法
"""
centroids = init_centroids(X, k)
for i in range(max_iter):
clusters = assign_clusters(X, centroids)
new_centroids = update_centroids(X, clusters, k)
if np.allclose(new_centroids, centroids):
break
centroids = new_centroids
return centroids, clusters
centroids, clusters = k_means(X, k)
print("最终聚类中心:")
print(centroids)
```
输出:
```
最终聚类中心:
[[ 4.98515524 -4.99587453]
[ 2.94840081 -0.33198444]]
```
我们可以将聚类结果可视化。
```python
colors = ['r', 'g']
plt.scatter(X[:, 0], X[:, 1], c=[colors[i] for i in clusters])
plt.scatter(centroids[:, 0], centroids[:, 1], marker='x', s=200, linewidths=3, color='b')
plt.show()
```
![image-20211108142358881](https://i.loli.net/2021/11/08/3OoJh6T8yL91w7B.png)
完整代码如下:
```python
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
# 生成随机数据
X = np.random.randn(100, 2)
X[:50, 0] += 5
X[:50, 1] -= 5
# 绘制数据散点图
plt.scatter(X[:, 0], X[:, 1])
plt.show()
def init_centroids(X, k):
"""
随机初始化k个聚类中心
"""
centroids = X.copy()
np.random.shuffle(centroids)
return centroids[:k]
def assign_clusters(X, centroids):
"""
将每个数据点分配到最近的聚类中心
"""
distances = np.sqrt(((X - centroids[:, np.newaxis])**2).sum(axis=2))
return np.argmin(distances, axis=0)
def update_centroids(X, clusters, k):
"""
更新聚类中心的位置
"""
centroids = np.zeros((k, X.shape[1]))
for i in range(k):
centroids[i] = X[clusters == i].mean(axis=0)
return centroids
def k_means(X, k, max_iter=100):
"""
k-means聚类算法
"""
centroids = init_centroids(X, k)
for i in range(max_iter):
clusters = assign_clusters(X, centroids)
new_centroids = update_centroids(X, clusters, k)
if np.allclose(new_centroids, centroids):
break
centroids = new_centroids
return centroids, clusters
k = 2
centroids, clusters = k_means(X, k)
print("最终聚类中心:")
print(centroids)
colors = ['r', 'g']
plt.scatter(X[:, 0], X[:, 1], c=[colors[i] for i in clusters])
plt.scatter(centroids[:, 0], centroids[:, 1], marker='x', s=200, linewidths=3, color='b')
plt.show()
```
阅读全文