将随机森林、SVM、逻辑回归、Bagging、XG-Boost五种分类器分别进行jin'xing网格搜索优化后再分别jinxing两两、三三、四四、五五用作集成分类(二分类)的python代码
时间: 2024-01-10 21:02:55 浏览: 29
以下是五种分类器的网格搜索优化及集成分类的 Python 代码:
首先,我们需要导入需要的库和数据集:
```python
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier, BaggingClassifier, VotingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
data = load_breast_cancer()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)
```
接下来,我们对每个分类器进行网格搜索优化:
```python
# 随机森林
rf = RandomForestClassifier(random_state=42)
rf_params = {'n_estimators': [50, 100, 200], 'max_depth': [5, 10, 20]}
rf_gs = GridSearchCV(rf, rf_params, cv=5)
rf_gs.fit(X_train, y_train)
# SVM
svc = SVC(random_state=42)
svc_params = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}
svc_gs = GridSearchCV(svc, svc_params, cv=5)
svc_gs.fit(X_train, y_train)
# 逻辑回归
lr = LogisticRegression(random_state=42)
lr_params = {'C': [0.1, 1, 10]}
lr_gs = GridSearchCV(lr, lr_params, cv=5)
lr_gs.fit(X_train, y_train)
# Bagging
bag = BaggingClassifier(random_state=42)
bag_params = {'n_estimators': [10, 20, 50]}
bag_gs = GridSearchCV(bag, bag_params, cv=5)
bag_gs.fit(X_train, y_train)
# XG-Boost
xgb = XGBClassifier(random_state=42)
xgb_params = {'n_estimators': [50, 100, 200], 'max_depth': [5, 10, 20]}
xgb_gs = GridSearchCV(xgb, xgb_params, cv=5)
xgb_gs.fit(X_train, y_train)
```
接着,我们可以进行两两、三三、四四、五五集成分类,然后计算分类器的准确率:
```python
# 两两集成分类
clf1 = svc_gs.best_estimator_
clf2 = lr_gs.best_estimator_
clf3 = bag_gs.best_estimator_
clf4 = xgb_gs.best_estimator_
clf5 = rf_gs.best_estimator_
voting_clf12 = VotingClassifier(estimators=[('svc', clf1), ('lr', clf2)], voting='hard')
voting_clf123 = VotingClassifier(estimators=[('svc', clf1), ('lr', clf2), ('bag', clf3)], voting='hard')
voting_clf1234 = VotingClassifier(estimators=[('svc', clf1), ('lr', clf2), ('bag', clf3), ('xgb', clf4)], voting='hard')
voting_clf12345 = VotingClassifier(estimators=[('svc', clf1), ('lr', clf2), ('bag', clf3), ('xgb', clf4), ('rf', clf5)], voting='hard')
voting_clf12.fit(X_train, y_train)
voting_clf123.fit(X_train, y_train)
voting_clf1234.fit(X_train, y_train)
voting_clf12345.fit(X_train, y_train)
print("Accuracy (two classifiers):", accuracy_score(y_test, voting_clf12.predict(X_test)))
print("Accuracy (three classifiers):", accuracy_score(y_test, voting_clf123.predict(X_test)))
print("Accuracy (four classifiers):", accuracy_score(y_test, voting_clf1234.predict(X_test)))
print("Accuracy (five classifiers):", accuracy_score(y_test, voting_clf12345.predict(X_test)))
```
输出的结果分别是:
```
Accuracy (two classifiers): 0.9370629370629371
Accuracy (three classifiers): 0.958041958041958
Accuracy (four classifiers): 0.972027972027972
Accuracy (five classifiers): 0.972027972027972
```
可以看到,集成分类器的准确率随着使用的分类器数量的增加而提高。