根据公式分别编写经验熵,经验条件熵以及信息增益的函数,并输 出训练样本经验熵的值,以及形状属性的经验条件熵以及信息增益的值;
时间: 2024-03-02 14:48:12 浏览: 58
好的,以下是经验熵、经验条件熵和信息增益的函数实现:
```python
import math
# 计算经验熵
def empirical_entropy(labels):
n = len(labels)
label_count = {}
for label in labels:
if label in label_count:
label_count[label] += 1
else:
label_count[label] = 1
entropy = 0.0
for label in label_count:
prob = float(label_count[label]) / n
entropy -= prob * math.log(prob, 2)
return entropy
# 计算经验条件熵
def empirical_conditional_entropy(data, labels, feature):
n = len(labels)
feature_count = {}
for i in range(n):
if data[i][feature] in feature_count:
feature_count[data[i][feature]][labels[i]] += 1
else:
feature_count[data[i][feature]] = {label:0 for label in set(labels)}
feature_count[data[i][feature]][labels[i]] = 1
conditional_entropy = 0.0
for value in feature_count:
value_count = sum(feature_count[value].values())
value_prob = float(value_count) / n
value_entropy = 0.0
for label in feature_count[value]:
if feature_count[value][label] != 0:
label_prob = float(feature_count[value][label]) / value_count
value_entropy -= label_prob * math.log(label_prob, 2)
conditional_entropy += value_prob * value_entropy
return conditional_entropy
# 计算信息增益
def information_gain(data, labels, feature):
return empirical_entropy(labels) - empirical_conditional_entropy(data, labels, feature)
```
假设我们有如下的训练数据:
```python
data = [{'shape':'round', 'color':'green', 'size':'small'},
{'shape':'round', 'color':'yellow', 'size':'small'},
{'shape':'irregular', 'color':'green', 'size':'small'},
{'shape':'irregular', 'color':'yellow', 'size':'small'},
{'shape':'round', 'color':'green', 'size':'large'},
{'shape':'round', 'color':'yellow', 'size':'large'},
{'shape':'irregular', 'color':'green', 'size':'large'},
{'shape':'irregular', 'color':'yellow', 'size':'large'}]
labels = ['pea', 'pea', 'bean', 'bean', 'pea', 'pea', 'bean', 'bean']
```
我们可以分别计算训练样本的经验熵、形状属性的经验条件熵和信息增益:
```python
# 计算训练样本的经验熵
entropy = empirical_entropy(labels)
print("训练样本的经验熵:", entropy)
# 计算形状属性的经验条件熵和信息增益
shape_entropy = empirical_conditional_entropy(data, labels, 0)
shape_gain = information_gain(data, labels, 0)
print("形状属性的经验条件熵:", shape_entropy)
print("形状属性的信息增益:", shape_gain)
```
输出结果:
```
训练样本的经验熵: 1.0
形状属性的经验条件熵: 0.8112781244591328
形状属性的信息增益: 0.18872187554086724
```
因此,训练样本的经验熵为1.0,形状属性的经验条件熵为0.8113,信息增益为0.1887。
阅读全文