数据集:sklearn 自带数据 load_wine; (2)使用gradientboost完成模型训练及测试;
时间: 2024-12-18 09:28:12 浏览: 20
数据集 `load_wine` 是scikit-learn库中的一个内置数据集,它主要用于分类任务,其中包含葡萄酒品质的相关化学成分测量值,共分为三个类别。这个数据集可以帮助新手理解监督学习算法如何应用于实际问题。
要使用Gradient Boosting (GB)(梯度提升决策树)来对这个数据集进行模型训练和测试,你可以按照以下步骤操作:
1. 导入所需的库:
```python
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, classification_report
```
2. 加载并查看数据:
```python
wine = datasets.load_wine()
X = wine.data
y = wine.target
print(f"特征数:{X.shape[1]}, 类别数:{len(np.unique(y))}")
```
3. 数据预处理,如标准化(可选,取决于GB的要求):
```python
if X.std() != 0:
scaler = StandardScaler()
X = scaler.fit_transform(X)
```
4. 划分训练集和测试集:
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
5. 训练 Gradient Boosting 分类器:
```python
gb_model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, random_state=42)
gb_model.fit(X_train, y_train)
```
6. 预测并评估模型性能:
```python
y_pred = gb_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred, target_names=wine.target_names)
print(f"Accuracy: {accuracy*100:.2f}%")
print("Classification Report:")
print(report)
```
阅读全文