将随机森林、SVM、逻辑回归、Bagging、XG-Boost五种分类器分别进行随机搜索优化
时间: 2023-10-09 07:07:08 浏览: 253
好的,可以使用sklearn中的RandomizedSearchCV函数进行随机搜索优化,具体步骤如下:
1. 导入需要的库和数据集
```python
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import BaggingClassifier
from xgboost import XGBClassifier
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform
iris = load_iris()
X = iris.data
y = iris.target
```
2. 定义分类器和超参数搜索范围
```python
# 定义分类器
clf_rf = RandomForestClassifier()
clf_svm = SVC()
clf_lr = LogisticRegression()
clf_bag = BaggingClassifier()
clf_xgb = XGBClassifier()
# 定义超参数搜索范围
param_dist_rf = {'n_estimators': [10, 50, 100, 200, 500],
'max_depth': [1, 3, 5, 7, 9, None],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4],
'max_features': ['sqrt', 'log2', None],
'bootstrap': [True, False],
'criterion': ['gini', 'entropy']}
param_dist_svm = {'C': uniform(0, 10),
'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
'degree': [1, 2, 3, 4, 5],
'gamma': ['scale', 'auto', 0.001, 0.01, 0.1, 1, 10, 100]}
param_dist_lr = {'C': uniform(0, 10),
'penalty': ['l1', 'l2', 'elasticnet', 'none'],
'fit_intercept': [True, False],
'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']}
param_dist_bag = {'n_estimators': [10, 50, 100, 200, 500],
'max_samples': [0.1, 0.5, 1.0],
'max_features': [0.1, 0.5, 1.0],
'bootstrap': [True, False]}
param_dist_xgb = {'max_depth': [3, 5, 7, 9],
'learning_rate': [0.01, 0.1, 0.3, 0.5],
'n_estimators': [50, 100, 200, 500],
'min_child_weight': [1, 3, 5],
'gamma': [0, 0.1, 0.2, 0.3],
'subsample': [0.5, 0.7, 1.0],
'colsample_bytree': [0.5, 0.7, 1.0]}
```
3. 对每个分类器进行随机搜索优化
```python
# 随机搜索优化
search_rf = RandomizedSearchCV(clf_rf, param_distributions=param_dist_rf, n_iter=100, cv=5, iid=False, n_jobs=-1)
search_rf.fit(X, y)
search_svm = RandomizedSearchCV(clf_svm, param_distributions=param_dist_svm, n_iter=100, cv=5, iid=False, n_jobs=-1)
search_svm.fit(X, y)
search_lr = RandomizedSearchCV(clf_lr, param_distributions=param_dist_lr, n_iter=100, cv=5, iid=False, n_jobs=-1)
search_lr.fit(X, y)
search_bag = RandomizedSearchCV(clf_bag, param_distributions=param_dist_bag, n_iter=100, cv=5, iid=False, n_jobs=-1)
search_bag.fit(X, y)
search_xgb = RandomizedSearchCV(clf_xgb, param_distributions=param_dist_xgb, n_iter=100, cv=5, iid=False, n_jobs=-1)
search_xgb.fit(X, y)
```
4. 输出每个分类器的最优参数和得分
```python
# 输出每个分类器的最优参数和得分
print('Random Forest - Best Params:', search_rf.best_params_)
print('Random Forest - Best Score:', search_rf.best_score_)
print('SVM - Best Params:', search_svm.best_params_)
print('SVM - Best Score:', search_svm.best_score_)
print('Logistic Regression - Best Params:', search_lr.best_params_)
print('Logistic Regression - Best Score:', search_lr.best_score_)
print('Bagging - Best Params:', search_bag.best_params_)
print('Bagging - Best Score:', search_bag.best_score_)
print('XGBoost - Best Params:', search_xgb.best_params_)
print('XGBoost - Best Score:', search_xgb.best_score_)
```
这样就可以分别对随机森林、SVM、逻辑回归、Bagging、XG-Boost五种分类器进行随机搜索优化,得到每个分类器的最优参数和得分。
阅读全文