用Python实现投票式的集成模型
时间: 2024-04-07 20:09:23 浏览: 88
投票式的集成模型可以通过将多个单一模型的预测结果进行投票来得出最终的预测结果。在Python中,可以使用sklearn库中的VotingClassifier类来实现。
首先,需要导入需要使用的库和数据集:
```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
接下来,定义三个单一模型:决策树模型、逻辑回归模型和KNN模型,并将它们组合成一个投票式的集成模型:
```python
tree_clf = DecisionTreeClassifier(max_depth=3, random_state=42)
log_clf = LogisticRegression(random_state=42)
knn_clf = KNeighborsClassifier(n_neighbors=4)
voting_clf = VotingClassifier(
estimators=[('tree', tree_clf), ('log', log_clf), ('knn', knn_clf)],
voting='hard'
)
```
其中,estimators参数是一个列表,每个元素是一个元组,包含了模型的名称和对应的模型对象。voting参数表示投票的方式,可以选择‘hard’(硬投票)或‘soft’(软投票)。硬投票是指将每个模型的预测结果进行简单的取模,得票最多的类别为最终预测结果;软投票是指将每个模型的预测概率进行加权平均,概率最高的类别为最终预测结果。
最后,使用训练集对投票式的集成模型进行训练,并使用测试集评估模型的性能:
```python
voting_clf.fit(X_train, y_train)
y_pred = voting_clf.predict(X_test)
print('Accuracy:', accuracy_score(y_test, y_pred))
```
完整代码如下:
```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
tree_clf = DecisionTreeClassifier(max_depth=3, random_state=42)
log_clf = LogisticRegression(random_state=42)
knn_clf = KNeighborsClassifier(n_neighbors=4)
voting_clf = VotingClassifier(
estimators=[('tree', tree_clf), ('log', log_clf), ('knn', knn_clf)],
voting='hard'
)
voting_clf.fit(X_train, y_train)
y_pred = voting_clf.predict(X_test)
print('Accuracy:', accuracy_score(y_test, y_pred))
```
阅读全文