请基于iris数据集,建立二分类逻辑回归模型,并绘制ROC曲线,计算AUC指标。
时间: 2024-10-21 10:10:38 浏览: 30
首先,我们需要导入所需的Python库,如pandas、numpy、sklearn等,然后从scikit-learn的datasets模块加载Iris数据集。在这个例子中,我们将假设我们关心的是Setosa和Versicolor两个品种的区分。
```python
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, auc, confusion_matrix
import matplotlib.pyplot as plt
# 加载数据
iris = load_iris()
X = iris.data[:, [2, 3]] # 选择花瓣长度和宽度作为特征
y = (iris.target == 0) | (iris.target == 1) # 将目标变量转换为二分类问题(Setosa vs Versicolor)
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 训练逻辑回归模型
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)
# 预测并得到概率
y_pred_proba = log_reg.predict_proba(X_test)[:, 1]
# 绘制ROC曲线
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
auc_score = auc(fpr, tpr)
plt.plot(fpr, tpr, label=f"Logistic Regression (AUC = {auc_score:.2f})")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver Operating Characteristic (ROC) Curve")
plt.legend(loc="lower right")
plt.show()
# 打印混淆矩阵
conf_mat = confusion_matrix(y_test, log_reg.predict(X_test))
print("Confusion Matrix:")
print(conf_mat)
阅读全文