用python在乳腺癌数据集上应用Logistic回归进行预测并分析绘图
时间: 2024-02-03 09:14:07 浏览: 74
好的,下面是具体步骤:
1.导入所需库和数据集
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)
```
2.拆分数据集为训练集和测试集
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```
3.训练模型并预测
```python
lr = LogisticRegression()
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)
```
4.评估模型性能
```python
conf_mat = confusion_matrix(y_test, y_pred)
print("Confusion matrix:\n", conf_mat)
acc_score = accuracy_score(y_test, y_pred)
print("Accuracy score:", acc_score)
```
5.绘制混淆矩阵
```python
plt.imshow(conf_mat, cmap="Blues")
for i in range(len(conf_mat)):
for j in range(len(conf_mat)):
plt.text(i, j, str(conf_mat[i][j]), ha="center", va="center")
plt.xlabel("Predicted label")
plt.ylabel("True label")
plt.xticks([0,1], ["Benign", "Malignant"])
plt.yticks([0,1], ["Benign", "Malignant"])
plt.title("Confusion matrix")
plt.colorbar()
plt.show()
```
6.绘制ROC曲线和AUC
```python
from sklearn.metrics import roc_curve, roc_auc_score
y_pred_proba = lr.predict_proba(X_test)[:,1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
plt.plot(fpr, tpr, label="ROC curve")
plt.plot([0,1], [0,1], "k--", label="Random guess")
plt.xlabel("FPR")
plt.ylabel("TPR")
plt.title("ROC curve")
plt.legend()
plt.show()
auc_score = roc_auc_score(y_test, y_pred_proba)
print("AUC score:", auc_score)
```
这样就完成了在乳腺癌数据集上应用Logistic回归进行预测并分析绘图的过程。
阅读全文