再测试Wine数据集
时间: 2024-12-10 14:37:10 浏览: 10
为了测试Wine数据集,我们可以再次使用K-means聚类算法。Wine数据集包含三种葡萄酒的数据,每种都有13个特征。让我们先加载数据,然后按照类似步骤进行操作:
1. 导入必要的库和数据集
```python
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import seaborn as sns
import matplotlib.pyplot as plt
wine_data = load_wine()
df_wine = pd.DataFrame(data=wine_data.data, columns=wine_data.feature_names)
target = wine_data.target
```
2. 划分训练集和测试集
```python
X_train, X_test, y_train, y_test = train_test_split(df_wine, target, test_size=0.2, random_state=42)
```
3. 数据预处理
```python
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
```
4. 选择K值(这里同样假设3个类别),创建并训练KMeans模型
```python
n_clusters = 3
kmeans = KMeans(n_clusters=n_clusters, random_state=42)
kmeans.fit(X_train_scaled)
```
5. 预测和评估
```python
labels_train = kmeans.predict(X_train_scaled)
labels_test = kmeans.predict(X_test_scaled)
# 可视化聚类结果
sns.pairplot(df_wine, hue=labels_train, diag_kind="kde")
plt.show()
# 计算训练集和测试集的轮廓系数
train_score = silhouette_score(X_train_scaled, labels_train)
test_score = silhouette_score(X_test_scaled, labels_test)
print(f"训练集轮廓系数: {train_score}")
print(f"测试集轮廓系数: {test_score}")
```
通过观察散点图和轮廓系数,你可以评估K-means聚类效果是否良好,以及是否存在过拟合或欠拟合的问题。
阅读全文