实现K-Means算法并将算法应用于我给定的鸢尾花数据集iris.txt,给出完整可运行的Python代码
时间: 2024-02-27 07:52:56 浏览: 79
好的,以下是实现K-Means算法并应用于iris数据集的Python代码:
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# 加载数据
data = pd.read_csv("iris.txt", header=None, delimiter=" ")
X = data.iloc[:, :-1].values
# 定义K-Means类
class KMeans:
def __init__(self, n_clusters=3, max_iter=300):
self.n_clusters = n_clusters # 聚类数
self.max_iter = max_iter # 最大迭代次数
def fit(self, X):
self.centroids = {} # 质心
for i in range(self.n_clusters):
self.centroids[i] = X[i]
for i in range(self.max_iter):
self.classes = {} # 聚类结果
for j in range(self.n_clusters):
self.classes[j] = []
# 计算距离并分配数据点到聚类
for x in X:
distances = [np.linalg.norm(x - self.centroids[c]) for c in self.centroids]
classification = distances.index(min(distances))
self.classes[classification].append(x)
# 计算新的质心
prev_centroids = dict(self.centroids)
for c in self.classes:
self.centroids[c] = np.average(self.classes[c], axis=0)
# 判断是否收敛
is_converged = True
for c in self.centroids:
if not np.array_equal(self.centroids[c], prev_centroids[c]):
is_converged = False
break
if is_converged:
break
def predict(self, X):
distances = [np.linalg.norm(X - self.centroids[c]) for c in self.centroids]
classification = distances.index(min(distances))
return classification
# 运行K-Means算法
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
# 绘制聚类结果
colors = ['r', 'g', 'b']
for classification in kmeans.classes:
color = colors[classification]
for features in kmeans.classes[classification]:
plt.scatter(features[0], features[1], color=color, s=30)
for centroid in kmeans.centroids:
plt.scatter(kmeans.centroids[centroid][0], kmeans.centroids[centroid][1], marker='*', color='k', s=100)
plt.show()
```
在这个例子中,我们使用了iris数据集,并把数据分为了3个聚类。需要注意的是,这个数据集是用空格分隔的,因此我们需要用`delimiter=" "`指定分隔符。在运行K-Means算法后,我们绘制了聚类结果。
阅读全文