用numpy写一个简单决策树
时间: 2024-04-08 15:31:14 浏览: 19
决策树是一种常用的分类算法,可以使用numpy库进行实现。下面是一个使用numpy实现的简单决策树的示例代码:
```python
import numpy as np
class DecisionTree:
def __init__(self, max_depth=None):
self.max_depth = max_depth
def fit(self, X, y):
self.tree = self._build_tree(X, y, depth=0)
def predict(self, X):
return np.array([self._traverse_tree(x, self.tree) for x in X])
def _build_tree(self, X, y, depth):
num_samples, num_features = X.shape
num_classes = len(np.unique(y))
# 基准情况:如果只有一个类别,或者达到最大深度,则返回叶节点
if len(np.unique(y)) == 1 or depth == self.max_depth:
return {'class': np.argmax(np.bincount(y)), 'depth': depth}
# 选择最佳特征和切分点
best_feature, best_threshold = self._find_best_split(X, y)
# 如果无法找到最佳切分点,则返回叶节点
if best_feature is None or best_threshold is None:
return {'class': np.argmax(np.bincount(y)), 'depth': depth}
# 根据最佳切分点将数据集划分为两部分
left_indices = X[:, best_feature] <= best_threshold
right_indices = ~left_indices
# 递归构建左右子树
left_tree = self._build_tree(X[left_indices], y[left_indices], depth + 1)
right_tree = self._build_tree(X[right_indices], y[right_indices], depth + 1)
return {'feature': best_feature, 'threshold': best_threshold,
'left': left_tree, 'right': right_tree, 'depth': depth}
def _find_best_split(self, X, y):
num_samples, num_features = X.shape
best_gini = 1.0
best_feature = None
best_threshold = None
# 遍历每个特征和每个可能的切分点
for feature in range(num_features):
thresholds = np.unique(X[:, feature])
for threshold in thresholds:
left_indices = X[:, feature] <= threshold
right_indices = ~left_indices
if len(left_indices) == 0 or len(right_indices) == 0:
continue
gini = self._gini_index(y[left_indices], y[right_indices])
if gini < best_gini:
best_gini = gini
best_feature = feature
best_threshold = threshold
return best_feature, best_threshold
def _gini_index(self, left_labels, right_labels):
num_left = len(left_labels)
num_right = len(right_labels)
total_samples = num_left + num_right
gini_left = 1.0 - np.sum([(np.sum(left_labels == c) / num_left) ** 2 for c in np.unique(left_labels)])
gini_right = 1.0 - np.sum([(np.sum(right_labels == c) / num_right) ** 2 for c in np.unique(right_labels)])
return (num_left / total_samples) * gini_left + (num_right / total_samples) * gini_right
def _traverse_tree(self, x, node):
if 'class' in node:
return node['class']
if x[node['feature']] <= node['threshold']:
return self._traverse_tree(x, node['left'])
else:
return self._traverse_tree(x, node['right'])
```
在上述代码中,我们定义了一个名为DecisionTree的类,它包含了fit方法用于训练决策树模型和predict方法用于进行预测。_build_tree方法实现了决策树的递归构建过程,_find_best_split方法用于寻找最佳切分点,_gini_index方法计算基尼指数,_traverse_tree方法用于遍历决策树进行预测。
你可以根据自己的需求修改代码中的参数和实现细节,比如调整最大深度、使用其他评估指标等。然后,使用训练集进行训练,最后使用测试集进行预测。
相关推荐
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)