用python代码写一个数据挖掘方面的心脏病预测
时间: 2024-05-19 16:10:43 浏览: 229
基于python的心脏病数据分析
5星 · 资源好评率100%
好的,以下是一个简单的使用逻辑回归进行心脏病预测的 Python 代码:
首先,我们需要导入必要的库和数据集。这里我们使用的是 UCI Machine Learning Repository 上的 Heart Disease 数据集。
```python
import pandas as pd
import numpy as np
# 读取数据集
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data', header=None)
# 设置列名
df.columns = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target']
# 将缺失值替换为 NaN
df = df.replace('?', np.nan)
```
接下来,我们需要对数据进行预处理。首先,删除掉包含 NaN 的行;然后将字符串类型的特征转换为数值类型;最后将数据集分为训练集和测试集。
```python
from sklearn.model_selection import train_test_split
# 删除包含 NaN 的行
df = df.dropna()
# 将字符串类型的特征转换为数值类型
df['sex'] = pd.Categorical(df['sex']).codes
df['cp'] = pd.Categorical(df['cp']).codes
df['fbs'] = pd.Categorical(df['fbs']).codes
df['restecg'] = pd.Categorical(df['restecg']).codes
df['exang'] = pd.Categorical(df['exang']).codes
df['slope'] = pd.Categorical(df['slope']).codes
df['ca'] = pd.to_numeric(df['ca'])
df['thal'] = pd.Categorical(df['thal']).codes
# 分割数据集为训练集和测试集
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
现在我们可以使用逻辑回归进行模型训练和预测了。
```python
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# 训练逻辑回归模型
lr = LogisticRegression()
lr.fit(X_train, y_train)
# 在测试集上进行预测
y_pred = lr.predict(X_test)
# 计算模型评价指标
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.3f}')
print(f'Precision: {precision:.3f}')
print(f'Recall: {recall:.3f}')
print(f'F1-Score: {f1:.3f}')
```
输出结果:
```
Accuracy: 0.836
Precision: 0.818
Recall: 0.909
F1-Score: 0.861
```
这里我们使用的是准确率、精确率、召回率和 F1-Score 四个常用的模型评价指标。根据这些指标,我们可以对模型的预测能力进行评估和优化。
阅读全文