决策树算法实现智能数据学习详细代码

时间: 2023-10-26 19:20:10 浏览: 78

面向对象程序设计课程设计：利用决策树判断西瓜质量（源代码）

决策树算法可以用来进行分类或者回归任务，下面分别给出分类和回归任务的决策树算法实现。分类决策树算法实现： ```python import numpy as np class Node: def __init__(self, feature_index=None, threshold=None, left=None, right=None, info_gain=None, value=None): self.feature_index = feature_index # 特征索引 self.threshold = threshold # 分割阈值 self.left = left # 左子树 self.right = right # 右子树 self.info_gain = info_gain # 信息增益 self.value = value # 叶子节点的类别值 class DecisionTreeClassifier: def __init__(self, max_depth=None, min_samples_split=2): self.max_depth = max_depth # 决策树最大深度 self.min_samples_split = min_samples_split # 分割所需最小样本数 self.root = None # 决策树根节点 def fit(self, X, y): self.root = self._grow_tree(X, y) def _grow_tree(self, X, y, depth=0): n_samples, n_features = X.shape n_labels = len(np.unique(y)) # 如果样本数小于所需最小样本数或者树达到最大深度，返回叶子节点 if (n_samples < self.min_samples_split) or (self.max_depth is not None and depth >= self.max_depth): return Node(value=self._most_common_label(y)) # 计算每个特征的信息增益，并选择最好的特征进行分割 best_feature, best_threshold, best_info_gain = self._best_criteria(X, y) # 如果信息增益小于等于0，则返回叶子节点 if best_info_gain == 0: return Node(value=self._most_common_label(y)) # 递归构建左右子树 left_indices, right_indices = self._split(X[:, best_feature], best_threshold) left = self._grow_tree(X[left_indices, :], y[left_indices], depth+1) right = self._grow_tree(X[right_indices, :], y[right_indices], depth+1) return Node(best_feature, best_threshold, left, right, best_info_gain) def _best_criteria(self, X, y): best_info_gain = -1 best_feature = None best_threshold = None # 在所有特征中选择最好的分割点 for feature_index in range(X.shape[1]): # 计算当前特征的信息增益和最佳分割点 feature_values = X[:, feature_index] thresholds = np.unique(feature_values) for threshold in thresholds: info_gain = self._information_gain(y, feature_values, threshold) if info_gain > best_info_gain: best_info_gain = info_gain best_feature = feature_index best_threshold = threshold return best_feature, best_threshold, best_info_gain def _information_gain(self, y, X_feature, threshold): parent_entropy = self._entropy(y) left_indices, right_indices = self._split(X_feature, threshold) if len(left_indices) == 0 or len(right_indices) == 0: return 0 n = len(y) n_l, n_r = len(left_indices), len(right_indices) e_l, e_r = self._entropy(y[left_indices]), self._entropy(y[right_indices]) child_entropy = (n_l / n) * e_l + (n_r / n) * e_r return parent_entropy - child_entropy def _entropy(self, y): n_labels = len(np.unique(y)) if n_labels <= 1: return 0 counts = np.bincount(y) probabilities = counts / len(y) entropy = -np.sum([p * np.log2(p) for p in probabilities if p > 0]) return entropy def _split(self, X_feature, threshold): left_indices = np.argwhere(X_feature <= threshold).flatten() right_indices = np.argwhere(X_feature > threshold).flatten() return left_indices, right_indices def _most_common_label(self, y): return np.bincount(y).argmax() def predict(self, X): return np.array([self._traverse_tree(x, self.root) for x in X]) def _traverse_tree(self, x, node): if node.value is not None: return node.value if x[node.feature_index] <= node.threshold: return self._traverse_tree(x, node.left) else: return self._traverse_tree(x, node.right) ``` 回归决策树算法实现： ```python import numpy as np class Node: def __init__(self, feature_index=None, threshold=None, left=None, right=None, value=None): self.feature_index = feature_index # 特征索引 self.threshold = threshold # 分割阈值 self.left = left # 左子树 self.right = right # 右子树 self.value = value # 叶子节点的预测值 class DecisionTreeRegressor: def __init__(self, max_depth=None, min_samples_split=2): self.max_depth = max_depth # 决策树最大深度 self.min_samples_split = min_samples_split # 分割所需最小样本数 self.root = None # 决策树根节点 def fit(self, X, y): self.root = self._grow_tree(X, y) def _grow_tree(self, X, y, depth=0): n_samples, n_features = X.shape # 如果样本数小于所需最小样本数或者树达到最大深度，返回叶子节点 if (n_samples < self.min_samples_split) or (self.max_depth is not None and depth >= self.max_depth): return Node(value=np.mean(y)) # 计算每个特征的最佳分割点，并选择最好的特征进行分割 best_feature, best_threshold = self._best_criteria(X, y) # 递归构建左右子树 left_indices, right_indices = self._split(X[:, best_feature], best_threshold) left = self._grow_tree(X[left_indices, :], y[left_indices], depth+1) right = self._grow_tree(X[right_indices, :], y[right_indices], depth+1) return Node(best_feature, best_threshold, left, right) def _best_criteria(self, X, y): best_mse = np.inf best_feature = None best_threshold = None # 在所有特征中选择最好的分割点 for feature_index in range(X.shape[1]): # 计算当前特征的最佳分割点 feature_values = X[:, feature_index] thresholds = np.unique(feature_values) for threshold in thresholds: mse = self._mse(y, feature_values, threshold) if mse < best_mse: best_mse = mse best_feature = feature_index best_threshold = threshold return best_feature, best_threshold def _mse(self, y, X_feature, threshold): left_indices, right_indices = self._split(X_feature, threshold) if len(left_indices) == 0 or len(right_indices) == 0: return np.inf y_left, y_right = y[left_indices], y[right_indices] mse = (len(y_left) * np.var(y_left) + len(y_right) * np.var(y_right)) / len(y) return mse def _split(self, X_feature, threshold): left_indices = np.argwhere(X_feature <= threshold).flatten() right_indices = np.argwhere(X_feature > threshold).flatten() return left_indices, right_indices def predict(self, X): return np.array([self._traverse_tree(x, self.root) for x in X]) def _traverse_tree(self, x, node): if node.value is not None: return node.value if x[node.feature_index] <= node.threshold: return self._traverse_tree(x, node.left) else: return self._traverse_tree(x, node.right) ```

阅读全文

决策树算法实现智能数据学习详细代码

相关推荐

Python实现决策树算法的完整代码解析

Python实现决策树算法与莺尾花数据集的分析

C45决策树算法 C45决策树算法

python实现决策树算法

数据挖掘大作业-基于Python实现的决策树算法的红酒质量检测+源代码+文档说明+数据

28.基于决策树算法的大学生就业预测系统28.高校就业预测系统-基于JSP+ Mysql+决策树算法设计与实现(可运行源码+数据

java实现的决策树算法

Python在莺尾花数据集上实现了决策树算法

简单实现决策树算法.zip

Python机器学习 决策树代码及实现 配备完整的数据集和代码功能 适合新手学习

决策树算法ID3的代码实验，编译后可直接使用的ID3代码

java实现的决策树算法（ID3）

c++决策树算法源码

重庆大学人工智能导论实验python实现源码(粒子群算法、决策树、遗传算法、搜索算法)+详细注释+数据.zip

决策树算法111.zip

基于python实现粒子群算法、搜索算法、遗传算法、决策树源代码+超详细注释+数据（人工智能大作业）.zip

python 编写决策树算法.zip

机器学习决策树 ID3算法的VC源代码.doc

决策树ID3算法代码

最新推荐

python使用minimax算法实现五子棋

yolo算法-手套-无手套-人数据集-14163张图像带标签-手套-无手套.zip

正整数数组验证库：确保值符合正整数规则

管理建模和仿真的文件

【损失函数与随机梯度下降】：探索学习率对损失函数的影响，实现高效模型训练

在ADS软件中，如何选择并优化低噪声放大器的直流工作点以实现最佳性能？

系统移植工具集：镜像、工具链及其他必备软件包

"互动学习：行动中的多样性与论文攻读经历"

【损失函数与批量梯度下降】：分析批量大小对损失函数影响，优化模型学习路径

在设计高性能模拟电路时，如何根据应用需求选择合适的运算放大器，并评估供电对电路性能的影响？

Python机器学习决策树代码及实现配备完整的数据集和代码功能适合新手学习