id3决策树python实现
时间: 2024-04-19 12:21:46 浏览: 86
python实现ID3决策树
ID3决策树是一种经典的机器学习算法,用于分类问题。下面是ID3决策树的Python实现的基本步骤:
1. 数据准备:首先,需要准备一个包含特征和标签的数据集。特征可以是离散值或连续值,标签是分类的结果。
2. 计算信息增益:使用信息增益来选择最佳的特征作为当前节点的划分依据。信息增益表示在已知某个特征的情况下,对分类结果的不确定性减少程度。
3. 构建决策树:根据信息增益选择最佳特征后,将该特征作为当前节点的划分依据,并根据该特征的取值创建子节点。递归地对每个子节点进行上述步骤,直到满足终止条件(如所有样本属于同一类别或没有更多特征可供选择)。
4. 预测:使用构建好的决策树对新样本进行分类预测。
下面是一个简单的ID3决策树的Python实现示例:
```python
import numpy as np
def calc_entropy(labels):
# 计算标签的熵
unique_labels, counts = np.unique(labels, return_counts=True)
probabilities = counts / len(labels)
entropy = -np.sum(probabilities * np.log2(probabilities))
return entropy
def calc_information_gain(data, labels, feature_index):
# 计算特征的信息增益
feature_values = np.unique(data[:, feature_index])
entropy = calc_entropy(labels)
new_entropy = 0
for value in feature_values:
subset_labels = labels[data[:, feature_index] == value]
subset_entropy = calc_entropy(subset_labels)
subset_probability = len(subset_labels) / len(labels)
new_entropy += subset_probability * subset_entropy
information_gain = entropy - new_entropy
return information_gain
def create_decision_tree(data, labels, features):
# 创建决策树
if len(np.unique(labels)) == 1:
# 如果所有样本属于同一类别,则返回该类别作为叶节点
return labels
if len(features) == 0:
# 如果没有更多特征可供选择,则返回样本中最多的类别作为叶节点
unique_labels, counts = np.unique(labels, return_counts=True)
return unique_labels[np.argmax(counts)]
best_feature_index = np.argmax([calc_information_gain(data, labels, i) for i in range(len(features))])
best_feature = features[best_feature_index]
decision_tree = {best_feature: {}}
feature_values = np.unique(data[:, best_feature_index])
for value in feature_values:
subset_data = data[data[:, best_feature_index] == value]
subset_labels = labels[data[:, best_feature_index] == value]
subset_features = features.copy()
subset_features.remove(best_feature)
decision_tree[best_feature][value] = create_decision_tree(subset_data, subset_labels, subset_features)
return decision_tree
def predict(decision_tree, sample):
# 使用决策树进行预测
if isinstance(decision_tree, str):
return decision_tree
feature = list(decision_tree.keys())
value = sample[feature]
subtree = decision_tree[feature][value]
return predict(subtree, sample)
# 示例用法
data = np.array([[1, 'S', 'M'], [1, 'M', 'M'], [1, 'M', 'L'], [1, 'S', 'L'], [1, 'S', 'M'],
[2, 'S', 'M'], [2, 'M', 'M'], [2, 'M', 'L'], [2, 'L', 'L'], [2, 'L', 'L'],
[3, 'L', 'L'], [3, 'M', 'L'], [3, 'M', 'M'], [3, 'L', 'M'], [3, 'L', 'L']])
labels = np.array(['N', 'N', 'Y', 'Y', 'N', 'N', 'N', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'Y', 'N'])
features = ['age', 'income', 'student']
decision_tree = create_decision_tree(data, labels, features)
sample = {'age': 2, 'income': 'M', 'student': 'M'}
prediction = predict(decision_tree, sample)
print("预测结果:", prediction)
```
阅读全文