不适用sklearn库编写代码进行决策树算法心脏病预测
时间: 2023-07-28 21:25:55 浏览: 93
不使用sklearn推演决策树
好的,以下是一个不使用sklearn库的决策树算法心脏病预测的示例代码:
```python
import pandas as pd
# 读取数据
heart_disease = pd.read_csv('heart_disease.csv')
# 定义特征和目标变量
X = heart_disease.drop('target', axis=1)
y = heart_disease['target']
# 定义节点类
class Node:
def __init__(self, feature=None, threshold=None, left=None, right=None, value=None):
self.feature = feature # 分裂特征
self.threshold = threshold # 分裂阈值
self.left = left # 左子树
self.right = right # 右子树
self.value = value # 节点值(叶节点才有)
# 定义决策树类
class DecisionTree:
def __init__(self, min_samples_split=2, max_depth=999):
self.min_samples_split = min_samples_split # 数据集划分最小样本数
self.max_depth = max_depth # 树的最大深度
# 计算基尼系数
def gini(self, y):
n_samples = len(y)
if n_samples == 0:
return 0
n_classes = len(set(y))
class_counts = [list(y).count(cls) for cls in range(n_classes)]
class_probs = [class_counts[i] / n_samples for i in range(n_classes)]
gini = 1 - sum([p ** 2 for p in class_probs])
return gini
# 计算信息增益
def info_gain(self, X, y, feature, threshold):
left_index = X[feature] < threshold
left_y = y[left_index]
right_y = y[~left_index]
n_samples = len(y)
left_gini = self.gini(left_y)
right_gini = self.gini(right_y)
gini_gain = self.gini(y) - (len(left_y) / n_samples) * left_gini - (len(right_y) / n_samples) * right_gini
return gini_gain
# 寻找最佳分裂特征和阈值
def find_best_split(self, X, y):
best_feature, best_threshold, best_gain = None, None, 0
for feature in X.columns:
for threshold in X[feature]:
gain = self.info_gain(X, y, feature, threshold)
if gain > best_gain:
best_feature, best_threshold, best_gain = feature, threshold, gain
return best_feature, best_threshold, best_gain
# 构建决策树
def build_tree(self, X, y, depth=0):
n_samples, n_features = X.shape
if n_samples >= self.min_samples_split and depth <= self.max_depth:
best_feature, best_threshold, best_gain = self.find_best_split(X, y)
if best_gain > 0:
left_index = X[best_feature] < best_threshold
X_left, y_left = X[left_index], y[left_index]
X_right, y_right = X[~left_index], y[~left_index]
left = self.build_tree(X_left, y_left, depth+1)
right = self.build_tree(X_right, y_right, depth+1)
return Node(best_feature, best_threshold, left, right)
value = sum(y) / n_samples
return Node(value=value)
# 预测单个样本
def predict_one(self, x, node):
if node.value is not None:
return node.value
if x[node.feature] < node.threshold:
return self.predict_one(x, node.left)
else:
return self.predict_one(x, node.right)
# 预测多个样本
def predict(self, X, tree):
y_pred = []
for i in range(len(X)):
y_pred.append(self.predict_one(X.iloc[i], tree))
return y_pred
# 计算准确率
def accuracy(self, y_pred, y_true):
correct = 0
for i in range(len(y_pred)):
if y_pred[i] == y_true[i]:
correct += 1
acc = correct / len(y_pred)
return acc
# 划分训练集和测试集
X_train, X_test = X[:int(len(X)*0.7)], X[int(len(X)*0.7):]
y_train, y_test = y[:int(len(y)*0.7)], y[int(len(y)*0.7):]
# 构建决策树模型
model = DecisionTree(min_samples_split=3, max_depth=3)
tree = model.build_tree(X_train, y_train)
# 预测测试集
y_pred = model.predict(X_test, tree)
# 计算准确率
acc = model.accuracy(y_pred, y_test)
print('Accuracy:', acc)
```
这个示例代码中,我们定义了一个 `DecisionTree` 类来实现决策树算法。在这个类中,我们定义了 `Node` 类来表示决策树的节点,其中包括分裂特征、分裂阈值、左子树、右子树和节点值(叶节点才有)。我们还定义了 `gini()` 函数来计算基尼系数,`info_gain()` 函数来计算信息增益,`find_best_split()` 函数来寻找最佳分裂特征和阈值,`build_tree()` 函数来构建决策树,`predict_one()` 函数来预测单个样本,`predict()` 函数来预测多个样本,`accuracy()` 函数来计算准确率。
在主程序中,我们首先读取数据,然后划分训练集和测试集。接着,我们构建决策树模型,并使用训练集训练模型。然后,我们使用测试集预测结果,并计算准确率。最后,我们输出准确率。
阅读全文