实验2:完成课本P124中的5.6乳腺癌预测问题,采用sklearn中的逻辑回归包,建立预测模型进行预测,并计算模型的准确率、召回率、AUC并绘制ROC曲线。 breast_cancer = load_breast_cancer()。
时间: 2024-10-09 18:09:07 浏览: 22
在Python中,我们可以按照以下步骤完成这个实验:
1. 导入所需库及数据集:
```python
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, recall_score, roc_auc_score, roc_curve, plot_roc_curve
import matplotlib.pyplot as plt
# 加载乳腺癌数据
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target
```
2. 划分训练集和测试集:
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
3. 使用逻辑回归建模并训练:
```python
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
```
4. 进行预测,并计算指标:
```python
y_pred = logreg.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
recall = recall_score(y_test, y_pred, average='weighted') # 使用加权平均,因为不平衡数据可能影响简单平均
roc_auc = roc_auc_score(y_test, logreg.predict_proba(X_test)[:, 1]) # 对于二分类问题,proba[:, 1]表示正类的概率
print("Accuracy:", accuracy)
print("Recall:", recall)
print("AUC:", roc_auc)
```
5. 绘制ROC曲线:
```python
y_scores = logreg.predict_proba(X_test)[:, 1]
fpr, tpr, _ = roc_curve(y_test, y_scores)
plt.figure(figsize=(8, 6))
plot_roc_curve(logreg, X_test, y_test)
plt.title('Receiver Operating Characteristic (ROC)')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc="lower right")
plt.show()
```
完成了以上步骤,你就完成了课本P124中的5.6乳腺癌预测问题,并计算了相应的准确率、召回率和AUC值,还绘制了ROC曲线来评估模型性能。
阅读全文