自己编写决策树算法实现心脏病预测
时间: 2023-07-20 15:08:57 浏览: 44
首先,需要收集一个心脏病的数据集,并将其分为训练集和测试集。然后,可以按照以下步骤实现决策树算法:
1. 计算每个特征的信息增益,以选择最佳的特征进行节点拆分。
2. 以信息增益最高的特征作为根节点进行拆分,并生成两个子节点。
3. 对于每个子节点,重复步骤1和2,直到达到预定义的停止条件(如达到最大深度或没有更多数据可拆分)。
4. 在终端节点处将数据分配给最常见的类别。
下面是一个简单的 Python 代码实现:
```
import numpy as np
class DecisionTree:
def __init__(self, max_depth=5):
self.max_depth = max_depth
def fit(self, X, y):
self.tree = self.build_tree(X, y, 0)
def predict(self, X):
return np.array([self.traverse(x, self.tree) for x in X])
def build_tree(self, X, y, depth):
n_samples, n_features = X.shape
n_labels = len(np.unique(y))
# stop if maximum depth reached or only one class present
if depth == self.max_depth or n_labels == 1:
return np.argmax(np.bincount(y))
# calculate information gain for each feature
best_feature, best_gain = None, -1
for i in range(n_features):
gain = self.information_gain(X[:, i], y)
if gain > best_gain:
best_feature = i
best_gain = gain
# stop if no information gain made
if best_gain == 0:
return np.argmax(np.bincount(y))
# split data using best feature
left_mask = X[:, best_feature] <= np.median(X[:, best_feature])
right_mask = X[:, best_feature] > np.median(X[:, best_feature])
left_tree = self.build_tree(X[left_mask], y[left_mask], depth+1)
right_tree = self.build_tree(X[right_mask], y[right_mask], depth+1)
# create node dictionary
node = {
'feature': best_feature,
'median': np.median(X[:, best_feature]),
'left': left_tree,
'right': right_tree
}
return node
def information_gain(self, feature, y):
n_samples = len(y)
entropy_parent = self.entropy(y)
# split data based on median of feature
left_mask = feature <= np.median(feature)
right_mask = feature > np.median(feature)
n_left, n_right = len(y[left_mask]), len(y[right_mask])
# calculate entropy of children nodes
entropy_left = self.entropy(y[left_mask])
entropy_right = self.entropy(y[right_mask])
# calculate weighted average of entropy for children nodes
entropy_children = (n_left/n_samples)*entropy_left + (n_right/n_samples)*entropy_right
# calculate information gain
information_gain = entropy_parent - entropy_children
return information_gain
def entropy(self, y):
n_samples = len(y)
_, counts = np.unique(y, return_counts=True)
probabilities = counts / n_samples
entropy = -np.sum(probabilities * np.log2(probabilities))
return entropy
def traverse(self, x, node):
if isinstance(node, int):
return node
elif x[node['feature']] <= node['median']:
return self.traverse(x, node['left'])
else:
return self.traverse(x, node['right'])
```
这个决策树实现使用信息增益作为节点拆分的标准,停止条件为达到最大深度或只有一个类别。它还包括一个entropy()函数,用于计算数据集的熵,以便计算信息增益。在build_tree()函数中,它首先计算每个特征的信息增益,选择具有最高增益的特征进行拆分,并递归地构建决策树。在predict()函数中,它使用递归遍历决策树来预测每个样本的类别。