利用决策树算法对鸢尾花数据进行建模分类,用graphviz可视化算法的过程与结合准确率、ROC曲线、F1分数、混淆矩阵、召回率、精确率等结果指标,代码展示加解释
时间: 2024-09-23 14:13:00 浏览: 54
利用决策树对鸢尾花数据集(Iris dataset)进行分类是一种常见的机器学习应用,它基于特征值构建决策规则。以下是使用Python的scikit-learn库,结合Graphviz进行决策树可视化,并计算关键性能指标的一个简短示例:
首先,确保已经安装了必要的库:
```bash
pip install scikit-learn graphviz pandas matplotlib seaborn
```
然后,导入所需模块并加载数据:
```python
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, roc_auc_score, f1_score, confusion_matrix, recall_score, precision_score
# 加载鸢尾花数据集
iris = datasets.load_iris()
X = iris.data
y = iris.target
```
接下来,分割数据集为训练集和测试集:
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```
创建并训练决策树模型:
```python
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
```
现在我们可以使用`plot_tree`函数可视化决策树:
```python
import graphviz
dot_data = tree.export_graphviz(clf, out_file=None,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
special_characters=True)
graph = graphviz.Source(dot_data)
graph.render("iris_decision_tree")
```
评估模型性能:
```python
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, clf.predict_proba(X_test)[:, -1])
f1 = f1_score(y_test, y_pred, average='macro')
conf_mat = confusion_matrix(y_test, y_pred)
recall = recall_score(y_test, y_pred, average='macro')
precision = precision_score(y_test, y_pred, average='macro')
print(f"Accuracy: {accuracy}")
print(f"ROC AUC: {roc_auc}")
print(f"F1 Score: {f1}")
print("Confusion Matrix:\n", conf_mat)
print(f"Recall: {recall}")
print(f"Precision: {precision}")
```
这里我们展示了如何通过决策树算法对鸢尾花数据进行分类,以及如何使用Graphviz生成可视化的决策树。同时,也计算了常用的评价指标,如准确率、AUC-ROC、F1分数、混淆矩阵、召回率和精确率。
阅读全文