python 读取csv文件 用k-means算法聚类 并可视化
时间: 2023-06-30 22:08:48 浏览: 125
好的,下面是一个简单的实现过程:
1. 读取CSV文件
```python
import pandas as pd
df = pd.read_csv("data.csv") # 读取csv文件
```
2. 数据预处理
```python
from sklearn.preprocessing import StandardScaler
# 将数据标准化
scaler = StandardScaler()
data_scaled = scaler.fit_transform(df)
```
3. 模型训练
```python
from sklearn.cluster import KMeans
# 定义k值
kmeans = KMeans(n_clusters=3, init='k-means++')
# 训练模型
kmeans.fit(data_scaled)
```
4. 可视化展示
```python
import matplotlib.pyplot as plt
# 分别提取聚类后的每个簇的数据
df["cluster"] = kmeans.labels_
df0 = df[df.cluster == 0]
df1 = df[df.cluster == 1]
df2 = df[df.cluster == 2]
# 绘制散点图
plt.scatter(df0["x"], df0["y"], color="red")
plt.scatter(df1["x"], df1["y"], color="green")
plt.scatter(df2["x"], df2["y"], color="blue")
# 绘制中心点
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], color="black", marker="*", s=200)
plt.show()
```
完整代码如下:
```python
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
# 读取csv文件
df = pd.read_csv("data.csv")
# 将数据标准化
scaler = StandardScaler()
data_scaled = scaler.fit_transform(df)
# 定义k值
kmeans = KMeans(n_clusters=3, init='k-means++')
# 训练模型
kmeans.fit(data_scaled)
# 分别提取聚类后的每个簇的数据
df["cluster"] = kmeans.labels_
df0 = df[df.cluster == 0]
df1 = df[df.cluster == 1]
df2 = df[df.cluster == 2]
# 绘制散点图
plt.scatter(df0["x"], df0["y"], color="red")
plt.scatter(df1["x"], df1["y"], color="green")
plt.scatter(df2["x"], df2["y"], color="blue")
# 绘制中心点
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], color="black", marker="*", s=200)
plt.show()
```
阅读全文