GBDT优化调参案例python
时间: 2023-09-06 11:07:04 浏览: 48
以下是一个GBDT的优化调参案例,使用Python实现:
1. 导入必要的库和数据集
```python
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import GridSearchCV, train_test_split
# 导入数据集
data = pd.read_csv('data.csv')
X = data.iloc[:, :-1]
y = data.iloc[:, -1]
```
2. 划分数据集为训练集和测试集
```python
# 划分数据集为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
```
3. 定义GBDT模型
```python
# 定义GBDT模型
gbdt = GradientBoostingRegressor()
```
4. 设置参数范围
```python
# 设置参数范围
param_grid = {'n_estimators': [50, 100, 150],
'max_depth': [3, 6, 9],
'learning_rate': [0.1, 0.05, 0.01]}
```
5. 使用GridSearchCV进行交叉验证和网格搜索
```python
# 使用GridSearchCV进行交叉验证和网格搜索
gbdt_grid = GridSearchCV(estimator=gbdt, param_grid=param_grid, cv=5)
gbdt_grid.fit(X_train, y_train)
```
6. 输出最优参数和最优模型
```python
# 输出最优参数和最优模型
print("Best parameters found: ", gbdt_grid.best_params_)
print("Best RMSE found: ", np.sqrt(np.abs(gbdt_grid.best_score_)))
gbdt_best = gbdt_grid.best_estimator_
```
7. 在测试集上评估模型性能
```python
# 在测试集上评估模型性能
y_pred = gbdt_best.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print("RMSE on test set: ", rmse)
```
完整代码如下:
```python
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import GridSearchCV, train_test_split
# 导入数据集
data = pd.read_csv('data.csv')
X = data.iloc[:, :-1]
y = data.iloc[:, -1]
# 划分数据集为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# 定义GBDT模型
gbdt = GradientBoostingRegressor()
# 设置参数范围
param_grid = {'n_estimators': [50, 100, 150],
'max_depth': [3, 6, 9],
'learning_rate': [0.1, 0.05, 0.01]}
# 使用GridSearchCV进行交叉验证和网格搜索
gbdt_grid = GridSearchCV(estimator=gbdt, param_grid=param_grid, cv=5)
gbdt_grid.fit(X_train, y_train)
# 输出最优参数和最优模型
print("Best parameters found: ", gbdt_grid.best_params_)
print("Best RMSE found: ", np.sqrt(np.abs(gbdt_grid.best_score_)))
gbdt_best = gbdt_grid.best_estimator_
# 在测试集上评估模型性能
y_pred = gbdt_best.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print("RMSE on test set: ", rmse)
```