信息增益pycharm代码
时间: 2023-10-19 16:09:40 浏览: 87
好的,这里是计算信息增益的 Python 代码:
```python
import math
def entropy(data):
"""
计算数据集的熵
:param data: 数据集,格式为列表,每个元素为一个样本,最后一个元素为标签
:return: 熵
"""
label_count = {}
for sample in data:
label = sample[-1]
if label not in label_count:
label_count[label] = 0
label_count[label] += 1
entropy = 0.0
for label in label_count:
prob = float(label_count[label]) / len(data)
entropy -= prob * math.log(prob, 2)
return entropy
def split_data(data, axis, value):
"""
根据特征划分数据集
:param data: 数据集,格式为列表,每个元素为一个样本,最后一个元素为标签
:param axis: 特征所在的列
:param value: 特征的取值
:return: 划分后的子集
"""
sub_data = []
for sample in data:
if sample[axis] == value:
sub_sample = sample[:axis]
sub_sample.extend(sample[axis+1:])
sub_data.append(sub_sample)
return sub_data
def choose_best_feature(data):
"""
选择最好的特征
:param data: 数据集,格式为列表,每个元素为一个样本,最后一个元素为标签
:return: 最好的特征所在的列
"""
num_features = len(data[0]) - 1
base_entropy = entropy(data)
best_info_gain = 0.0
best_feature = -1
for i in range(num_features):
feature_list = [sample[i] for sample in data]
unique_values = set(feature_list)
new_entropy = 0.0
for value in unique_values:
sub_data = split_data(data, i, value)
prob = len(sub_data) / float(len(data))
new_entropy += prob * entropy(sub_data)
info_gain = base_entropy - new_entropy
if info_gain > best_info_gain:
best_info_gain = info_gain
best_feature = i
return best_feature
# 示例数据
data = [[1, 1, 'yes'],
[1, 1, 'yes'],
[1, 0, 'no'],
[0, 1, 'no'],
[0, 1, 'no']]
# 计算信息增益
print(choose_best_feature(data))
```
输出结果为 `0`,表示第一个特征(即第一列)是最好的特征。
阅读全文