决策树-基尼系数演算代码
时间: 2024-05-30 22:05:59 浏览: 23
决策树是一种常用的机器学习算法,基尼系数是决策树分类中常用的一种衡量标准。下面是一个简单的基尼系数演算代码:
```
def gini(labels):
counts = {}
for label in labels:
if label not in counts:
counts[label] = 0
counts[label] += 1
impurity = 1
for label in counts:
prob_of_label = counts[label] / float(len(labels))
impurity -= prob_of_label**2
return impurity
def split_gini(X, y, feature_index, threshold):
left_X, right_X, left_y, right_y = [], [], [], []
for i in range(len(X)):
if X[i][feature_index] < threshold:
left_X.append(X[i])
left_y.append(y[i])
else:
right_X.append(X[i])
right_y.append(y[i])
left_gini = gini(left_y)
right_gini = gini(right_y)
return (left_gini * len(left_y) + right_gini * len(right_y)) / float(len(y))
```
其中`gini`函数计算了给定标签集合的基尼系数,`split_gini`函数用于计算在给定特征和阈值下切分数据集的基尼系数。可以将其用于决策树中寻找最优切分点。