优化代码并添加注释
时间: 2023-08-11 08:12:59 浏览: 48
好的,我会尝试优化代码并添加注释。
以下是一个例子:
```python
# 导入所需的库
import numpy as np
# 定义函数,用于计算两个向量之间的欧几里得距离
def euclidean_distance(x1, x2):
return np.sqrt(np.sum((x1 - x2)**2))
# 定义 K-Means 类
class KMeans:
def __init__(self, K=3, max_iters=100, plot_steps=False):
self.K = K
self.max_iters = max_iters
self.plot_steps = plot_steps
# 初始化中心点
self.centroids = []
def initialize_centroids(self, X):
# 随机选择 K 个样本作为中心点
centroids = X.copy()
np.random.shuffle(centroids)
return centroids[:self.K]
def closest_centroid(self, sample, centroids):
# 计算样本点到每个中心点的距离,并返回距离最近的中心点的索引
distances = [euclidean_distance(sample, point) for point in centroids]
closest_index = np.argmin(distances)
return closest_index
def create_clusters(self, X, centroids):
# 分配每个样本到最近的中心点,并创建一个簇
clusters = [[] for _ in range(self.K)]
for sample_index, sample in enumerate(X):
centroid_index = self.closest_centroid(sample, centroids)
clusters[centroid_index].append(sample_index)
return clusters
def calculate_centroids(self, X, clusters):
# 计算每个簇的中心点
centroids = np.zeros((self.K, X.shape[1]))
for cluster_index, cluster in enumerate(clusters):
cluster_mean = np.mean(X[cluster], axis=0)
centroids[cluster_index] = cluster_mean
return centroids
def predict(self, X):
# 运行 K-Means 算法,返回每个样本所属的簇的索引
self.centroids = self.initialize_centroids(X)
for i in range(self.max_iters):
clusters = self.create_clusters(X, self.centroids)
if self.plot_steps:
self.plot(X, clusters, self.centroids)
old_centroids = self.centroids
self.centroids = self.calculate_centroids(X, clusters)
if np.allclose(self.centroids, old_centroids):
break
# 返回每个样本所属的簇的索引
return self.get_cluster_labels(clusters, X)
def get_cluster_labels(self, clusters, X):
# 返回每个样本所属的簇的索引
labels = np.zeros(X.shape[0])
for cluster_index, cluster in enumerate(clusters):
for sample_index in cluster:
labels[sample_index] = cluster_index
return labels
```
在上面的代码中,我添加了注释来解释每个函数的作用,以及每个函数中使用的变量。此外,我还将一些常见的计算封装成函数,以提高代码的可读性和可维护性。
阅读全文