练习 在皮马印第安人糖尿病数据集上比较logistic regression和naive bayesrian分类器的性能。diabetes = pd.read_csv('./work/diabetes.csv')
时间: 2024-10-15 17:11:15 浏览: 39
皮马印第安人糖尿病数据集
在Pima Indians Diabetes数据集上,Logistic Regression(逻辑回归)和Naive Bayes分类器常常用于预测二元分类问题,比如患者是否患有糖尿病。这个数据集包含了一些患者的生理指标,如年龄、血糖水平等,可以作为输入特征,目标变量则是诊断结果。
首先,你需要加载数据集:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix
# 加载数据
diabetes = pd.read_csv('./work/diabetes.csv')
# 预处理数据(如:缺失值填充、编码分类变量)
X = diabetes.drop('Outcome', axis=1) # 特征
y = diabetes['Outcome'] # 目标变量
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 训练Logistic Regression模型
lr_classifier = LogisticRegression()
lr_classifier.fit(X_train, y_train)
# 使用Logistic Regression进行预测
y_pred_lr = lr_classifier.predict(X_test)
# 训练Naive Bayes模型
nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)
# Naive Bayes预测
y_pred_nb = nb_classifier.predict(X_test)
# 计算性能指标
accuracy_lr = accuracy_score(y_test, y_pred_lr)
cm_lr = confusion_matrix(y_test, y_pred_lr)
accuracy_nb = accuracy_score(y_test, y_pred_nb)
cm_nb = confusion_matrix(y_test, y_pred_nb)
# 比较两个模型的性能
print(f"Logistic Regression Accuracy: {accuracy_lr}, Confusion Matrix: \n{cm_lr}")
print(f"Naive Bayes Accuracy: {accuracy_nb}, Confusion Matrix: \n{cm_nb}")
阅读全文