1.11.2. Random forests and other randomized tree ensembles 使用random forest对load_wine数据集进行分类(调参)。
时间: 2024-10-06 21:05:34 浏览: 22
随机森林(Random Forests)是一种集成学习方法,它通过构建一系列决策树并取其平均预测结果来提高模型的稳定性和准确度。每个决策树都是独立训练的,且在创建新树时会随机选择一部分特征和样本来增加多样性,减少过拟合风险。
使用random forest对`load_wine`数据集进行分类通常需要以下几个步骤:
1. 导入必要的库:
```python
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV
```
2. 加载数据并预处理:
```python
wine = load_wine()
X = wine.data
y = wine.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```
3. 初始化随机森林分类器并设置超参数网格:
```python
param_grid = {
'n_estimators': [50, 100, 200], # 树的数量
'max_depth': [None, 10, 20], # 每棵树的最大深度
'min_samples_split': [2, 5, 10], # 分裂节点所需的最小样本数
'min_samples_leaf': [1, 2, 4] # 叶子节点所需的最小样本数
}
rf = RandomForestClassifier(random_state=42)
grid_search = GridSearchCV(rf, param_grid, cv=5) # 交叉验证次数
```
4. 训练和评估模型:
```python
grid_search.fit(X_train, y_train)
best_rf = grid_search.best_estimator_
y_pred = best_rf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Best parameters: {grid_search.best_params_}")
print(f"Accuracy: {accuracy}")
```
这里我们进行了参数调整(Grid Search),寻找最优的随机森林配置,以最大化在测试集上的性能。
阅读全文