python手写kmeans进行图片聚类(不调用kmeans函数)
时间: 2023-07-02 08:07:18 浏览: 38
K-means 是一种常见的聚类算法,它可以将数据点划分为预定数量的簇。在这里,我们将手写一个 K-means 算法来进行图片聚类。
首先,我们需要加载图片并将其转换为向量。我们可以使用 Python 的 Pillow 库来读取图片,并使用 numpy 库将其转换为向量。
```python
from PIL import Image
import numpy as np
image_path = "image.jpg"
k = 4
# Load image and convert to numpy array
image = Image.open(image_path)
image_array = np.array(image)
# Flatten the image array to a 2D array
image_vector = image_array.reshape(-1, 3)
```
接下来,我们需要初始化 K 个聚类中心。我们可以随机选择 K 个数据点作为聚类中心。为了保证每次运行结果一致,我们可以使用 numpy 的随机种子设置随机数种子。
```python
np.random.seed(42)
# Initialize K cluster centers randomly
cluster_centers = image_vector[np.random.choice(range(len(image_vector)), size=k, replace=False)]
```
接下来,我们需要将每个数据点分配到最近的聚类中心。我们可以使用欧氏距离来计算数据点和聚类中心之间的距离,然后将每个数据点分配到距离最近的聚类中心。
```python
def assign_clusters(data, centers):
# Calculate distance between each data point and cluster center
distances = np.sqrt(np.sum((data - centers[:, np.newaxis])**2, axis=2))
# Assign each data point to the closest cluster center
clusters = np.argmin(distances, axis=0)
return clusters
clusters = assign_clusters(image_vector, cluster_centers)
```
现在,我们需要更新每个聚类中心的位置。我们可以根据每个聚类中心包含的数据点的平均值来更新聚类中心的位置。
```python
def update_centers(data, clusters):
# Update each cluster center to be the mean of its assigned data points
centers = np.array([data[clusters == i].mean(axis=0) for i in range(len(np.unique(clusters)))])
return centers
cluster_centers = update_centers(image_vector, clusters)
```
最后,我们可以将聚类结果可视化出来。
```python
# Reshape the cluster assignments to match the original image shape
cluster_assignments = clusters.reshape(image_array.shape[:2])
# Create a new image with the same shape as the original image
clustered_image = np.zeros_like(image_array)
# Assign each pixel in the new image to the corresponding cluster center
for i in range(image_array.shape[0]):
for j in range(image_array.shape[1]):
clustered_image[i, j] = cluster_centers[cluster_assignments[i, j]]
# Convert the new image array to a Pillow Image object and save it
clustered_image = Image.fromarray(np.uint8(clustered_image))
clustered_image.save("clustered_image.jpg")
```
完整代码如下:
```python
from PIL import Image
import numpy as np
image_path = "image.jpg"
k = 4
# Load image and convert to numpy array
image = Image.open(image_path)
image_array = np.array(image)
# Flatten the image array to a 2D array
image_vector = image_array.reshape(-1, 3)
np.random.seed(42)
# Initialize K cluster centers randomly
cluster_centers = image_vector[np.random.choice(range(len(image_vector)), size=k, replace=False)]
def assign_clusters(data, centers):
# Calculate distance between each data point and cluster center
distances = np.sqrt(np.sum((data - centers[:, np.newaxis])**2, axis=2))
# Assign each data point to the closest cluster center
clusters = np.argmin(distances, axis=0)
return clusters
def update_centers(data, clusters):
# Update each cluster center to be the mean of its assigned data points
centers = np.array([data[clusters == i].mean(axis=0) for i in range(len(np.unique(clusters)))])
return centers
for i in range(10):
clusters = assign_clusters(image_vector, cluster_centers)
cluster_centers = update_centers(image_vector, clusters)
# Reshape the cluster assignments to match the original image shape
cluster_assignments = clusters.reshape(image_array.shape[:2])
# Create a new image with the same shape as the original image
clustered_image = np.zeros_like(image_array)
# Assign each pixel in the new image to the corresponding cluster center
for i in range(image_array.shape[0]):
for j in range(image_array.shape[1]):
clustered_image[i, j] = cluster_centers[cluster_assignments[i, j]]
# Convert the new image array to a Pillow Image object and save it
clustered_image = Image.fromarray(np.uint8(clustered_image))
clustered_image.save("clustered_image.jpg")
```