在皮马印第安人糖尿病数据集上比较logistic regression和naive bayesrian分类器的性能。diabetes = pd.read_csv('./work/diabetes.csv')diabetes补全代码
时间: 2024-10-15 07:10:35 浏览: 35
皮马印第安人糖尿病数据集
在Python中,我们可以使用`pandas`库加载数据集,然后利用`sklearn`库中的`LogisticRegression`和` GaussianNB`(朴素贝叶斯分类器)来进行模型训练和性能评估。下面是一个简单的步骤示例:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report
# 加载数据
diabetes = pd.read_csv('./work/diabetes.csv')
# 首先,假设'diabetes'数据集中有一个特征用于预测(例如'Outcome'),其他列为特征
X = diabetes.drop('Outcome', axis=1) # 特征
y = diabetes['Outcome'] # 目标变量
# 数据划分(80%训练,20%测试)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 初始化Logistic Regression 和 Naive Bayes 分类器
lr_classifier = LogisticRegression()
nb_classifier = GaussianNB()
# 训练模型
lr_classifier.fit(X_train, y_train)
nb_classifier.fit(X_train, y_train)
# 预测
lr_predictions = lr_classifier.predict(X_test)
nb_predictions = nb_classifier.predict(X_test)
# 评估性能
lr_accuracy = accuracy_score(y_test, lr_predictions)
nb_accuracy = accuracy_score(y_test, nb_predictions)
print(f"Logistic Regression Accuracy: {lr_accuracy}")
print("Naive Bayes Accuracy: ", nb_accuracy)
# 生成分类报告,展示每个类别的精度、召回率等指标
lr_classification_report = classification_report(y_test, lr_predictions)
nb_classification_report = classification_report(y_test, nb_predictions)
print("\nLogistic Regression Classification Report:\n", lr_classification_report)
print("\nNaive Bayes Classification Report:\n", nb_classification_report)
阅读全文