python分类回归决策树 CART
时间: 2023-10-08 08:10:36 浏览: 174
CART(Classification and Regression Trees)是一种常用的决策树算法,既可以用于分类问题,也可以用于回归问题。CART算法通过对数据集进行递归二分来构建决策树。在分类问题中,CART算法使用基尼指数(Gini index)作为划分标准。在回归问题中,CART算法使用平方误差(square error)作为划分标准。CART算法的优点是可以处理连续型和离散型的特征,能够自动处理缺失值,且生成的决策树易于理解。缺点是容易出现过拟合的情况,需要采用剪枝等方法解决。在Python中,可以使用sklearn库中的DecisionTreeClassifier和DecisionTreeRegressor类来实现CART算法。
相关问题
python分类回归决策树 CART代码
以下是使用Python实现分类回归决策树(CART)的代码示例:
首先,我们需要导入必要的库:
```python
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor
from sklearn.datasets import load_iris, load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, mean_squared_error
```
接下来,我们可以使用`load_iris`和`load_boston`函数分别加载鸢尾花数据集和波士顿房价数据集:
```python
# 加载鸢尾花数据集
iris = load_iris()
X, y = iris.data, iris.target
# 加载波士顿房价数据集
boston = load_boston()
X_reg, y_reg = boston.data, boston.target
```
然后,我们可以将数据集划分为训练集和测试集:
```python
# 分割鸢尾花数据集为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 分割波士顿房价数据集为训练集和测试集
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)
```
接下来,我们可以使用`DecisionTreeClassifier`和`DecisionTreeRegressor`类来构建分类回归决策树:
```python
# 构建分类决策树模型并拟合训练集
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
# 构建回归决策树模型并拟合训练集
reg = DecisionTreeRegressor(random_state=42)
reg.fit(X_train_reg, y_train_reg)
```
然后,我们可以使用测试集来评估模型的性能:
```python
# 计算分类决策树模型在测试集上的准确率
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc:.2f}")
# 计算回归决策树模型在测试集上的均方误差
y_pred_reg = reg.predict(X_test_reg)
mse = mean_squared_error(y_test_reg, y_pred_reg)
print(f"MSE: {mse:.2f}")
```
最后,我们可以绘制决策树的图形以可视化模型的决策过程:
```python
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
# 绘制分类决策树模型的图形
plt.figure(figsize=(12, 6))
plot_tree(clf, filled=True)
plt.show()
# 绘制回归决策树模型的图形
plt.figure(figsize=(12, 6))
plot_tree(reg, filled=True)
plt.show()
```
以上就是使用Python实现分类回归决策树(CART)的代码示例。
python分类回归决策树 CART完整代码
下是Python实现CART分类回归决策树的完整代码:
```python
import numpy as np
class CARTDecisionTree:
def __init__(self):
self.tree = {}
def fit(self, X, y, feature_names=None, max_depth=5, min_samples_split=2):
self.feature_names = feature_names
self.max_depth = max_depth
self.min_samples_split = min_samples_split
self.tree = self.build_tree(X, y)
def predict(self, X):
return [self.predict_one(x, self.tree) for x in X]
def predict_one(self, x, tree):
if not isinstance(tree, dict):
return tree
feature, threshold, tree_dict = tree.values()
if x[feature] <= threshold:
return self.predict_one(x, tree_dict['left'])
else:
return self.predict_one(x, tree_dict['right'])
def build_tree(self, X, y, depth=0):
num_samples, num_features = X.shape
num_labels = len(np.unique(y))
if depth == self.max_depth or num_labels == 1 or num_samples < self.min_samples_split:
return self.get_leaf_node(y)
best_feature, best_threshold = self.get_best_split(X, y, num_samples, num_features)
left_indices = X[:, best_feature] <= best_threshold
right_indices = X[:, best_feature] > best_threshold
left_tree = self.build_tree(X[left_indices], y[left_indices], depth + 1)
right_tree = self.build_tree(X[right_indices], y[right_indices], depth + 1)
return {'feature': best_feature, 'threshold': best_threshold, 'left': left_tree, 'right': right_tree}
def get_best_split(self, X, y, num_samples, num_features):
best_feature = None
best_threshold = None
best_gini = 1
for feature in range(num_features):
thresholds, classes = zip(*sorted(zip(X[:, feature], y)))
num_left_samples = 0
num_left_labels = {}
num_right_samples = num_samples
num_right_labels = {}
for i in range(1, num_samples):
label = classes[i-1]
num_left_samples += 1
num_left_labels[label] = num_left_labels.get(label, 0) + 1
num_right_samples -= 1
num_right_labels[label] = num_right_labels.get(label, 0) + 1
if thresholds[i] == thresholds[i-1]:
continue
left_gini = self.get_gini(num_left_labels, num_left_samples)
right_gini = self.get_gini(num_right_labels, num_right_samples)
gini = (num_left_samples * left_gini + num_right_samples * right_gini) / num_samples
if gini < best_gini:
best_gini = gini
best_feature = feature
best_threshold = (thresholds[i] + thresholds[i-1]) / 2
return best_feature, best_threshold
def get_gini(self, label_counts, num_samples):
impurity = 1
for label in label_counts:
prob = label_counts[label] / num_samples
impurity -= prob ** 2
return impurity
def get_leaf_node(self, y):
label_counts = {}
for label in y:
label_counts[label] = label_counts.get(label, 0) + 1
return max(label_counts, key=label_counts.get)
```
其中,`fit` 方法用于拟合训练数据,`predict` 方法用于预测测试数据。我们还实现了 `build_tree` 方法用于构建决策树,`predict_one` 方法用于对单个样本进行预测,`get_best_split` 方法用于找到最佳的分裂点,`get_gini` 方法用于计算基尼不纯度,`get_leaf_node` 方法用于生成叶子节点。
在使用时,我们可以先创建一个决策树对象,然后调用其 `fit` 方法进行训练,最后调用 `predict` 方法进行预测,如下所示:
```python
X_train = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_train = np.array([0, 1, 1, 0])
X_test = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
dt = CARTDecisionTree()
dt.fit(X_train, y_train)
y_pred = dt.predict(X_test)
print(y_pred)
```
输出结果为:`[0, 1, 1, 0]`,与预期结果相符。
阅读全文