基于数据挖掘的心脏病预测各个步骤代码
时间: 2023-07-30 14:10:06 浏览: 88
基于随机森林实现的心脏病分类预测python源码(数据挖掘作业).zip
心脏病预测是一个常见的数据挖掘应用,下面是基于Python的心脏病预测的几个步骤的代码示例:
1. 数据获取和预处理
```python
import pandas as pd
import numpy as np
# 读取数据文件
data = pd.read_csv('heart.csv')
# 数据预处理
# 将分类变量转换成数值变量
data['sex'] = np.where(data['sex']=='male', 1, 0)
data['cp'] = np.where(data['cp']==4, 0, data['cp'])
data = pd.get_dummies(data, columns=['restecg', 'slope', 'thal'])
# 划分特征和标签
X = data.drop(['target'], axis=1)
y = data['target']
```
2. 特征选择和降维
```python
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif
from sklearn.decomposition import PCA
# 特征选择
selector = SelectKBest(f_classif, k=10)
X_new = selector.fit_transform(X, y)
# 降维
pca = PCA(n_components=5)
X_pca = pca.fit_transform(X_new)
```
3. 模型训练和评估
```python
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
# 划分训练集和验证集
X_train, X_test, y_train, y_test = train_test_split(X_pca, y, test_size=0.2, random_state=42)
# 训练模型
clf = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=42)
clf.fit(X_train, y_train)
# 预测验证集
y_pred = clf.predict(X_test)
# 评估模型
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
```
以上是基于Python的心脏病预测的几个步骤的代码示例,可以根据实际需求进行修改和完善。
阅读全文