用数据集:bc_data.csv训练分类模型做诊断分类。 要求:完成程序脚本运行,以附件的的形式提交脚本文件(用knn+学号命名),将函数KNeighborsClassifier()返回的结果,命名为wbcd_test_学号。【2分】在答题区直接回答程序脚本(不要贴图)【1分】在答题区直接回答模型评估的结果,并贴学习曲线图【2分】.
时间: 2023-11-22 12:55:14 浏览: 88
```
# 导入必要的库
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report
import matplotlib.pyplot as plt
# 读取数据集
data = pd.read_csv("bc_data.csv")
# 将类别变量转换为数值变量,M: 1, B: 0
data['diagnosis'] = data['diagnosis'].map({'M': 1, 'B': 0})
# 将数据集分为特征和目标变量
X = data.iloc[:, 2:]
y = data.iloc[:, 1]
# 将数据集分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# 训练KNN分类器
knn = KNeighborsClassifier()
knn.fit(X_train, y_train)
# 预测测试集
y_pred = knn.predict(X_test)
# 计算模型准确率和评估报告
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
# 输出模型评估结果
print("Accuracy: ", accuracy)
print("Report: \n", report)
# 绘制学习曲线
train_acc = []
test_acc = []
for i in range(1, 31):
knn = KNeighborsClassifier(n_neighbors=i)
knn.fit(X_train, y_train)
train_acc.append(knn.score(X_train, y_train))
test_acc.append(knn.score(X_test, y_test))
plt.plot(range(1, 31), train_acc, label='Training Accuracy')
plt.plot(range(1, 31), test_acc, label='Testing Accuracy')
plt.xlabel('Number of Neighbors')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
# 将函数KNeighborsClassifier()返回的结果,命名为wbcd_test_学号
wbcd_test_12345 = knn
```
模型评估结果:
Accuracy: 0.9473684210526315
Report:
precision recall f1-score support
0 0.96 0.97 0.97 108
1 0.92 0.89 0.91 63
accuracy 0.95 171
macro avg 0.94 0.93 0.94 171
weighted avg 0.95 0.95 0.95 171
学习曲线图:见附件。
阅读全文