以测试集AUC为评价指标,用交叉验证调整参数,利用随机森林预测二分类任务的代码
时间: 2024-03-03 18:50:26 浏览: 97
好的,下面是利用随机森林预测二分类任务的代码,使用交叉验证调整参数,并以测试集AUC为评价指标:
```python
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import roc_auc_score
# 读取训练集和测试集数据
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')
# 获取特征和标签
X_train = train_data.drop('label', axis=1)
y_train = train_data['label']
X_test = test_data.drop('label', axis=1)
y_test = test_data['label']
# 构建随机森林模型并进行交叉验证调参
param_grid = {
'n_estimators': [50, 100, 150],
'max_depth': [5, 10, 15],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4],
}
rf = RandomForestClassifier(random_state=42, n_jobs=-1)
grid_search = GridSearchCV(rf, param_grid, cv=5, scoring='roc_auc')
grid_search.fit(X_train, y_train)
# 输出最优参数
print(grid_search.best_params_)
# 使用最优参数训练模型并进行预测
rf = RandomForestClassifier(random_state=42, n_jobs=-1, **grid_search.best_params_)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
# 输出测试集AUC
print("测试集AUC为:", roc_auc_score(y_test, y_pred))
```
这段代码中,首先读取训练集和测试集数据,并获取特征和标签。然后使用GridSearchCV进行交叉验证调参,设置参数范围为n_estimators、max_depth、min_samples_split和min_samples_leaf。最后输出最优参数,使用最优参数训练模型并进行预测,输出测试集AUC。
阅读全文