使用python实现id3
时间: 2024-05-07 10:17:51 浏览: 50
ID3(Iterative Dichotomiser 3)是一种用于决策树分类的算法。下面是一个使用Python实现ID3算法的示例代码:
```python
from math import log2
from collections import Counter
def entropy(labels):
n_labels = len(labels)
if n_labels <= 1:
return 0
counts = Counter(labels)
probs = [c / n_labels for c in counts.values()]
return -sum(p * log2(p) for p in probs)
def information_gain(data, feature, target):
total_entropy = entropy(data[target])
values = data[feature].unique()
weighted_entropy = sum((data[feature] == value).mean() * entropy(data[data[feature] == value][target]) for value in values)
return total_entropy - weighted_entropy
def id3(data, target, features):
if len(features) == 0:
return Counter(data[target]).most_common(1)[0][0]
if len(set(data[target])) == 1:
return data[target].iloc[0]
gains = [(information_gain(data, feature, target), feature) for feature in features]
best_feature = max(gains)[1]
tree = {best_feature: {}}
remaining_features = [f for f in features if f != best_feature]
for value in data[best_feature].unique():
sub_data = data[data[best_feature] == value].reset_index(drop=True)
subtree = id3(sub_data, target, remaining_features)
tree[best_feature][value] = subtree
return tree
```
其中,`entropy`函数计算数据集中标签的熵,`information_gain`函数计算使用某个特征划分后的信息增益,`id3`函数递归构建决策树。
请注意,这只是ID3算法的一个基本实现,可能需要根据具体问题进行调整和优化。
阅读全文