基于Sklearn框架的葡萄酒数据win.csv分类代码
时间: 2024-10-28 18:09:01 浏览: 26
葡萄酒品质数据集 Wine Quality(4K+ 记录) CSV
基于Scikit-learn (sklearn) 框架对 `wine.csv` 数据集进行分类,通常需要以下几个步骤:
首先,你需要导入必要的库,包括`pandas`用于数据处理,`numpy`用于数值计算,以及`sklearn`本身及其预处理、模型选择和评估模块:
```python
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
# 加载数据
data = pd.read_csv('win.csv')
```
然后,加载并了解数据集:
```python
# 查看数据前几行
print(data.head())
# 分割特征(X)和目标变量(y)
X = data.drop('class', axis=1)
y = data['class']
```
接下来,划分训练集和测试集:
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
标准化数据,以便所有特征具有相似的尺度:
```python
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
```
选择模型,这里以随机森林分类器为例:
```python
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
```
最后,进行预测,并评估性能:
```python
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
print(f"Accuracy: {accuracy}")
print("\nClassification Report:\n", report)
```
阅读全文