导出文件数据,划分训练(80%)和测试集(20%),使用K近邻,支持向量机,随机森林,决策树,逻辑回归五种算法预测疾病,并画出auc,准确率,精确率,召回率条形图和线图(一张图:线图各点有数字)),写出python代码
时间: 2023-06-26 14:06:43 浏览: 223
用机器学习的方法对学生成绩进行统计分析和预测。包括决策树算法、逻辑回归算法、随机森林算法等,用python.zip
5星 · 资源好评率100%
很好,下面是我提供的Python代码实现:
```python
# 导入必要的库
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, accuracy_score, precision_score, recall_score
import matplotlib.pyplot as plt
# 读取数据
df = pd.read_csv('data.csv')
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(df.drop('disease', axis=1), df['disease'], test_size=0.2, random_state=42)
# 定义模型
knn = KNeighborsClassifier()
svm = SVC(probability=True)
rf = RandomForestClassifier()
dt = DecisionTreeClassifier()
lr = LogisticRegression()
models = {'KNN': knn, 'SVM': svm, 'Random Forest': rf, 'Decision Tree': dt, 'Logistic Regression': lr}
# 训练模型并预测
results = {}
for name, model in models.items():
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, y_prob)
acc = accuracy_score(y_test, y_pred)
pre = precision_score(y_test, y_pred)
rec = recall_score(y_test, y_pred)
results[name] = {'AUC': auc, 'Accuracy': acc, 'Precision': pre, 'Recall': rec}
# 绘制条形图
metrics = ['AUC', 'Accuracy', 'Precision', 'Recall']
values = list(results.values())
algorithms = list(results.keys())
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd']
fig, axs = plt.subplots(len(metrics), figsize=(10, 15))
for i, metric in enumerate(metrics):
axs[i].bar(algorithms, [r[metric] for r in values], color=colors)
axs[i].set_ylabel(metric)
plt.show()
# 绘制线图
for name, result in results.items():
plt.plot(metrics, [result[m] for m in metrics], label=name)
plt.legend()
plt.show()
```
上述代码将数据集分为训练集和测试集,并使用`K近邻`,`支持向量机`,`随机森林`,`决策树`,`逻辑回归`五种算法进行预测。然后,计算出每个模型在`AUC`,`准确率`,`精确率`和`召回率`方面的性能指标,并绘制出条形图和线图来可视化这些结果。
请注意,您需要将代码中的数据集文件名更改为您自己的数据集文件名。
阅读全文