python实现id3算法
时间: 2023-09-30 16:02:07 浏览: 46
ID3算法是一种决策树学习算法,可以用于分类问题。下面是使用Python实现ID3算法的基本步骤:
1. 导入必要的库,包括 numpy 和 pandas。
```python
import numpy as np
import pandas as pd
```
2. 定义一个函数来计算数据集的熵。
```python
def calc_entropy(data):
labels = data[:, -1]
unique_labels, counts = np.unique(labels, return_counts=True)
probabilities = counts / counts.sum()
entropy = sum(probabilities * np.log2(probabilities))
return -entropy
```
3. 定义一个函数来选择最佳特征。
```python
def choose_best_feature(data):
num_features = data.shape[1] - 1
base_entropy = calc_entropy(data)
best_info_gain = 0.0
best_feature = -1
for i in range(num_features):
feat_list = data[:, i]
unique_feats = np.unique(feat_list)
new_entropy = 0.0
for value in unique_feats:
sub_data = data[data[:, i] == value]
prob = sub_data.shape[0] / data.shape[0]
new_entropy += prob * calc_entropy(sub_data)
info_gain = base_entropy - new_entropy
if info_gain > best_info_gain:
best_info_gain = info_gain
best_feature = i
return best_feature
```
4. 定义一个函数来创建决策树。
```python
def create_tree(data, labels):
class_list = data[:, -1]
if len(np.unique(class_list)) == 1:
return class_list[0]
if data.shape[1] == 1:
return np.argmax(np.bincount(class_list))
best_feature = choose_best_feature(data)
best_feature_label = labels[best_feature]
my_tree = {best_feature_label: {}}
del(labels[best_feature])
feat_values = data[:, best_feature]
unique_values = np.unique(feat_values)
for value in unique_values:
sub_labels = labels[:]
my_tree[best_feature_label][value] = create_tree(data[data[:, best_feature] == value][:, :-1], sub_labels)
return my_tree
```
5. 使用示例数据集测试算法。
```python
data = np.array([[1, 1, 'yes'], [1, 1, 'yes'], [1, 0, 'no'], [0, 1, 'no'], [0, 1, 'no']])
labels = ['no surfacing', 'flippers']
tree = create_tree(data, labels)
print(tree)
```
输出结果:
```
{'no surfacing': {0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}}}
```
这个决策树表示了如何根据“no surfacing”和“flippers”这两个特征来进行分类。