python糖尿病预测分析逻辑回归
时间: 2023-11-22 15:50:20 浏览: 204
以下是使用Python进行糖尿病预测分析的逻辑回归方法:
1. 准备数据:使用Python解析csv文件并填充缺失值
```python
import pandas as pd
import numpy as np
# 读取csv文件
df = pd.read_csv('diabetes.csv')
# 将缺失值替换为平均值
df['Glucose'].replace(0, np.nan, inplace=True)
df['BloodPressure'].replace(0, np.nan, inplace=True)
df['SkinThickness'].replace(0, np.nan, inplace=True)
df['Insulin'].replace(0, np.nan, inplace=True)
df['BMI'].replace(0, np.nan, inplace=True)
df.fillna(df.mean(), inplace=True)
# 将数据分为训练集和测试集
from sklearn.model_selection import train_test_split
X = df.drop('Outcome', axis=1)
y = df['Outcome']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
```
2. 分析数据:可视化并观察数据
```python
import seaborn as sns
import matplotlib.pyplot as plt
# 绘制糖尿病患者和非患者的BMI分布图
sns.distplot(df[df['Outcome']==0]['BMI'], label='Non-Diabetic')
sns.distplot(df[df['Outcome']==1]['BMI'], label='Diabetic')
plt.legend()
plt.show()
```
3. 训练模型:使用逻辑回归模型进行训练
```python
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# 训练逻辑回归模型
model = LogisticRegression()
model.fit(X_train, y_train)
# 预测测试集结果
y_pred = model.predict(X_test)
# 计算准确率
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
```
阅读全文