给定数据集,设计程序计算数据集的熵和信息增益
时间: 2024-05-06 19:22:13 浏览: 96
以下是Python代码,可以计算数据集的熵和信息增益:
```python
import math
# 计算数据集的熵
def entropy(data):
total = len(data)
label_counts = {}
for row in data:
label = row[-1]
if label not in label_counts:
label_counts[label] = 0
label_counts[label] += 1
entropy = 0.0
for label in label_counts:
prob = float(label_counts[label]) / total
entropy -= prob * math.log(prob, 2)
return entropy
# 计算数据集在某个特征下的信息增益
def information_gain(data, feature):
total = len(data)
feature_counts = {}
for row in data:
value = row[feature]
if value not in feature_counts:
feature_counts[value] = 0
feature_counts[value] += 1
feature_entropy = 0.0
for value in feature_counts:
prob = float(feature_counts[value]) / total
subset = [row for row in data if row[feature] == value]
feature_entropy += prob * entropy(subset)
return entropy(data) - feature_entropy
```
其中,`data`是一个二维列表,每行代表一个样本,最后一列是样本的标签;`entropy`函数计算数据集的熵,`information_gain`函数计算数据集在某个特征下的信息增益。可以通过调用这两个函数来计算数据集的熵和每个特征的信息增益。
阅读全文