使用scikit-learn中的heart数据,分别采用决策树和随机森林训练心脏病预测模型,注意恰当使用参数寻优的思想。将代码、分类结果以及决策树模型的可视化图片粘贴至空白处。
时间: 2024-10-13 18:03:22 浏览: 29
ml-heart-disease:使用随机森林进行心脏病预测和分析
首先,我们需要安装必要的库,包括`pandas`, `numpy`, `matplotlib`, `seaborn`, 和 `sklearn`。然后我们可以按照以下步骤使用scikit-learn中的heart数据集来构建决策树和随机森林模型:
```python
# 导入所需的库
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_heart
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
# 加载心脏数据集
heart = load_heart()
X = heart.data
y = heart.target
# 数据预处理(例如:归一化、特征选择等)
# 这里假设我们已经处理好数据
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 参数网格搜索(对于决策树和随机森林)
param_grid_tree = {
'criterion': ['gini', 'entropy'],
'max_depth': range(1, 11),
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4]
}
grid_tree = GridSearchCV(DecisionTreeClassifier(), param_grid_tree, cv=5)
grid_tree.fit(X_train, y_train)
param_grid_forest = {
'n_estimators': [50, 100, 200],
'max_depth': range(1, 11),
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4]
}
grid_forest = GridSearchCV(RandomForestClassifier(), param_grid_forest, cv=5)
grid_forest.fit(X_train, y_train)
# 训练模型
tree_model = grid_tree.best_estimator_
forest_model = grid_forest.best_estimator_
# 预测
y_pred_tree = tree_model.predict(X_test)
y_pred_forest = forest_model.predict(X_test)
# 评估
print("Decision Tree Model:")
print(classification_report(y_test, y_pred_tree))
print(confusion_matrix(y_test, y_pred_tree))
print("\nRandom Forest Model:")
print(classification_report(y_test, y_pred_forest))
print(confusion_matrix(y_test, y_pred_forest))
# 可视化决策树
plt.figure(figsize=(12, 8))
tree_dot_data = tree_model.export_graphviz(
out_file=None,
feature_names=heart.feature_names,
class_names=heart.target_names,
filled=True,
rounded=True,
special_characters=True
)
graphviz.Source(tree_dot_data).render("decision_tree.dot")
plt.imshow(plt.imread("decision_tree.dot.png"))
plt.title("Decision Tree Visualization")
# 对于随机森林,Scikit-Learn默认不会提供可视化的工具,你可以考虑使用第三方库如`plotly`或`eli5`。
阅读全文