cart决策树python实现_《机器学习Python实现_09_02_决策树_CART》
时间: 2023-07-30 17:12:46 浏览: 108
好的,下面是关于决策树的CART算法的Python实现示例。
首先,我们需要导入相关的库:
```python
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
```
接下来,我们需要加载数据集:
```python
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.DataFrame(iris.target, columns=['target'])
```
然后,我们将数据集分成训练集和测试集:
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```
接下来,我们定义一个函数来计算基尼不纯度:
```python
def gini_impurity(y):
_, counts = np.unique(y, return_counts=True)
probabilities = counts / len(y)
return 1 - np.sum(probabilities ** 2)
```
然后,我们定义一个函数来计算基尼不纯度的加权平均值:
```python
def weighted_gini_impurity(groups):
total_size = sum(len(group) for group in groups)
gini = 0
for group in groups:
size = len(group)
if size == 0:
continue
score = gini_impurity(group['target'])
gini += score * (size / total_size)
return gini
```
接下来,我们定义一个函数来拆分数据集:
```python
def test_split(index, value, X, y):
left_mask = X.iloc[:, index] < value
right_mask = X.iloc[:, index] >= value
left = {'X': X[left_mask], 'y': y[left_mask]}
right = {'X': X[right_mask], 'y': y[right_mask]}
return left, right
```
然后,我们定义一个函数来选择最佳的数据集拆分:
```python
def get_best_split(X, y):
best_index, best_value, best_score, best_groups = None, None, float('inf'), None
for index in range(X.shape[1]):
for value in X.iloc[:, index]:
groups = test_split(index, value, X, y)
score = weighted_gini_impurity(list(groups.values()))
if score < best_score:
best_index, best_value, best_score, best_groups = index, value, score, groups
return {'feature_index': best_index, 'feature_value': best_value, 'groups': best_groups}
```
接下来,我们定义一个函数来创建一个叶节点:
```python
def create_leaf_node(y):
return y['target'].mode()[0]
```
然后,我们定义一个函数来创建一个决策树:
```python
def create_decision_tree(X, y, max_depth, min_size, depth):
best_split = get_best_split(X, y)
left, right = best_split['groups'].values()
del(best_split['groups'])
if not left or not right:
return create_leaf_node(pd.concat([left, right], axis=0))
if depth >= max_depth:
return create_leaf_node(y)
if len(left) < min_size:
left = create_leaf_node(left)
else:
left = create_decision_tree(left['X'], left['y'], max_depth, min_size, depth+1)
if len(right) < min_size:
right = create_leaf_node(right)
else:
right = create_decision_tree(right['X'], right['y'], max_depth, min_size, depth+1)
return {'left': left, 'right': right, **best_split}
```
最后,我们定义一个函数来进行预测:
```python
def predict(node, row):
if row[node['feature_index']] < node['feature_value']:
if isinstance(node['left'], dict):
return predict(node['left'], row)
else:
return node['left']
else:
if isinstance(node['right'], dict):
return predict(node['right'], row)
else:
return node['right']
```
现在我们已经定义了所有必要的函数,我们可以用以下代码来创建并测试我们的决策树模型:
```python
tree = create_decision_tree(X_train, y_train, max_depth=5, min_size=10, depth=1)
y_pred = np.array([predict(tree, row) for _, row in X_test.iterrows()])
print('Accuracy:', accuracy_score(y_test, y_pred))
```
这就是一个基于CART算法的决策树的Python实现示例。
阅读全文