用决策树 对一个有10个标签2个类别的数据集 用python实现信息增益
时间: 2024-05-03 12:21:44 浏览: 86
以下是用Python实现信息增益的步骤:
1. 导入必要的库
```python
import pandas as pd
import numpy as np
from math import log2
```
2. 定义计算信息熵的函数
```python
def entropy(target_col):
elements, counts = np.unique(target_col, return_counts=True)
entropy = np.sum([(-counts[i]/np.sum(counts)) * np.log2(counts[i]/np.sum(counts)) for i in range(len(elements))])
return entropy
```
3. 定义计算信息增益的函数
```python
def information_gain(data, split_attribute_name, target_name="class"):
total_entropy = entropy(data[target_name])
vals, counts = np.unique(data[split_attribute_name], return_counts=True)
Weighted_Entropy = np.sum([(counts[i]/np.sum(counts)) * entropy(data.where(data[split_attribute_name]==vals[i]).dropna()[target_name]) for i in range(len(vals))])
Information_Gain = total_entropy - Weighted_Entropy
return Information_Gain
```
4. 加载数据集
```python
df = pd.read_csv("dataset.csv")
```
5. 计算每个特征的信息增益
```python
ig = {}
for col in df.columns[:-1]:
ig[col] = information_gain(df, col, "class")
```
6. 输出每个特征的信息增益
```python
for key, value in ig.items():
print(f"{key}: {value}")
```
注意:以上代码只是一个简单的信息增益实现示例,实际应用中还需要根据具体情况进行修改和优化。
阅读全文