自己编写决策树代码进行心脏病预测不使用sklearn库
时间: 2024-02-23 20:58:10 浏览: 67
好的,我可以帮你回答这个问题。
决策树是一种常见的分类算法,在心脏病预测中也有广泛应用。以下是一个简单的决策树实现,可以用于心脏病预测:
```python
import numpy as np
class Node:
def __init__(self, feature=None, threshold=None, left=None, right=None, value=None):
self.feature = feature # 该节点选择的特征
self.threshold = threshold # 该节点选择的特征的阈值
self.left = left # 左子节点
self.right = right # 右子节点
self.value = value # 叶子节点的值
class DecisionTree:
def __init__(self, max_depth=None):
self.max_depth = max_depth # 决策树的最大深度
self.root = None
def fit(self, X, y):
self.root = self._build_tree(X, y, depth=0)
def predict(self, X):
return np.array([self._predict_one(x) for x in X])
def _build_tree(self, X, y, depth):
n_samples, n_features = X.shape
n_labels = len(np.unique(y))
if depth == self.max_depth or n_labels == 1 or n_samples < 2:
leaf_value = self._most_common_label(y)
return Node(value=leaf_value)
best_feature, best_threshold = self._choose_best_feature(X, y)
left_indices = X[:, best_feature] < best_threshold
right_indices = X[:, best_feature] >= best_threshold
left = self._build_tree(X[left_indices], y[left_indices], depth+1)
right = self._build_tree(X[right_indices], y[right_indices], depth+1)
return Node(feature=best_feature, threshold=best_threshold, left=left, right=right)
def _choose_best_feature(self, X, y):
n_samples, n_features = X.shape
best_gain = -1
best_feature = None
best_threshold = None
for feature in range(n_features):
thresholds = np.unique(X[:, feature])
for threshold in thresholds:
gain = self._information_gain(X, y, feature, threshold)
if gain > best_gain:
best_gain = gain
best_feature = feature
best_threshold = threshold
return best_feature, best_threshold
def _information_gain(self, X, y, feature, threshold):
parent_entropy = self._entropy(y)
left_indices = X[:, feature] < threshold
right_indices = X[:, feature] >= threshold
n_left, n_right = len(y[left_indices]), len(y[right_indices])
if n_left == 0 or n_right == 0:
return 0
left_entropy = self._entropy(y[left_indices])
right_entropy = self._entropy(y[right_indices])
child_entropy = (n_left / len(y)) * left_entropy + (n_right / len(y)) * right_entropy
return parent_entropy - child_entropy
def _entropy(self, y):
hist = np.bincount(y)
p = hist / len(y)
return -np.sum([p[i] * np.log2(p[i]) for i in range(len(p)) if p[i] > 0])
def _most_common_label(self, y):
return np.bincount(y).argmax()
def _predict_one(self, x):
node = self.root
while node.value is None:
if x[node.feature] < node.threshold:
node = node.left
else:
node = node.right
return node.value
```
这个决策树实现包含了以下几个方法:
- `fit(X, y)`: 用训练数据 X 和标签 y 训练决策树。
- `predict(X)`: 对测试数据 X 进行预测,返回预测结果数组。
- `_build_tree(X, y, depth)`: 递归构建决策树,返回根节点。
- `_choose_best_feature(X, y)`: 选择最优特征和阈值。
- `_information_gain(X, y, feature, threshold)`: 计算信息增益。
- `_entropy(y)`: 计算数据集的熵。
- `_most_common_label(y)`: 找出出现次数最多的标签。
- `_predict_one(x)`: 对单个样本进行预测。
你可以使用这个决策树实现进行心脏病预测。首先,你需要准备训练数据和测试数据,并将它们转换成 numpy 数组的形式。然后,你可以创建一个 `DecisionTree` 实例并调用 `fit` 方法进行训练,最后调用 `predict` 方法进行预测。
阅读全文