RandomForestRegressor源码实现
时间: 2024-05-08 18:21:41 浏览: 45
随机森林源码
4星 · 用户满意度95%
RandomForestRegressor是一种基于随机森林的回归算法,它通过构建多个决策树来预测连续性变量的值。下面是一份简单的RandomForestRegressor源码实现:
```python
import numpy as np
class DecisionTreeRegressor:
def __init__(self, max_depth=None):
self.max_depth = max_depth
def fit(self, X, y):
self.n_features_ = X.shape[1]
self.tree_ = self._build_tree(X, y)
def _build_tree(self, X, y, depth=0):
n_samples = X.shape[0]
n_labels = len(np.unique(y))
if depth == self.max_depth or n_labels == 1 or n_samples < 2:
return np.mean(y)
feature_idxs = np.random.choice(self.n_features_, size=int(np.sqrt(self.n_features_)), replace=False)
best_feature, best_split = self._best_feature_split(X, y, feature_idxs)
left_idxs = X[:, best_feature] <= best_split
right_idxs = X[:, best_feature] > best_split
left_tree = self._build_tree(X[left_idxs], y[left_idxs], depth + 1)
right_tree = self._build_tree(X[right_idxs], y[right_idxs], depth + 1)
return (best_feature, best_split, left_tree, right_tree)
def _best_feature_split(self, X, y, feature_idxs):
best_score = float('inf')
for feature in feature_idxs:
for split in np.unique(X[:, feature]):
left_idxs = X[:, feature] <= split
right_idxs = X[:, feature] > split
if len(y[left_idxs]) < 1 or len(y[right_idxs]) < 1:
continue
score = self._mse(y[left_idxs]) + self._mse(y[right_idxs])
if score < best_score:
best_score = score
best_feature = feature
best_split = split
return best_feature, best_split
def _mse(self, y):
return np.mean((y - np.mean(y)) ** 2)
def predict(self, X):
return np.array([self._predict(inputs) for inputs in X])
def _predict(self, inputs):
node = self.tree_
while isinstance(node, tuple):
feature, split, left, right = node
if inputs[feature] <= split:
node = left
else:
node = right
return node
class RandomForestRegressor:
def __init__(self, n_estimators=100, max_depth=None):
self.n_estimators = n_estimators
self.max_depth = max_depth
def fit(self, X, y):
self.trees_ = [DecisionTreeRegressor(max_depth=self.max_depth) for _ in range(self.n_estimators)]
for tree in self.trees_:
random_idxs = np.random.choice(X.shape[0], size=X.shape[0], replace=True)
X_tree = X[random_idxs]
y_tree = y[random_idxs]
tree.fit(X_tree, y_tree)
def predict(self, X):
return np.mean([tree.predict(X) for tree in self.trees_], axis=0)
```
该实现中包含两个类:`DecisionTreeRegressor`和`RandomForestRegressor`。`DecisionTreeRegressor`类实现了决策树的构建和预测,而`RandomForestRegressor`类则通过构建多个决策树来实现随机森林回归。
在`DecisionTreeRegressor`类中,`fit`方法用于训练决策树,`_build_tree`方法用于递归构建决策树,`_best_feature_split`方法用于在当前节点中选择最佳的分裂特征和分裂点,`_mse`方法用于计算均方误差。`predict`方法用于对输入数据进行预测,`_predict`方法用于递归预测。
在`RandomForestRegressor`类中,`fit`方法用于训练随机森林中的多个决策树,`predict`方法用于对输入数据进行预测。在训练每个决策树时,随机选择一部分数据进行训练,这样可以增加随机性,减少模型的方差。最终预测结果是多个决策树预测结果的平均值。
阅读全文