利用Python从给定people数据集中设计程序计算数据集的熵和信息增益，并输出

以下是利用Python从给定people数据集中计算数据集的熵和信息增益的程序： ```python import math # 定义people数据集 people = [ {'age': '<30', 'income': 'high', 'student': 'no', 'credit_rating': 'fair', 'buys_computer': 'no'}, {'age': '<30', 'income': 'high', 'student': 'no', 'credit_rating': 'excellent', 'buys_computer': 'no'}, {'age': '30-40', 'income': 'high', 'student': 'no', 'credit_rating': 'fair', 'buys_computer': 'yes'}, {'age': '>40', 'income': 'medium', 'student': 'no', 'credit_rating': 'fair', 'buys_computer': 'yes'}, {'age': '>40', 'income': 'low', 'student': 'yes', 'credit_rating': 'fair', 'buys_computer': 'yes'}, {'age': '>40', 'income': 'low', 'student': 'yes', 'credit_rating': 'excellent', 'buys_computer': 'no'}, {'age': '30-40', 'income': 'low', 'student': 'yes', 'credit_rating': 'excellent', 'buys_computer': 'yes'}, {'age': '<30', 'income': 'medium', 'student': 'no', 'credit_rating': 'fair', 'buys_computer': 'no'}, {'age': '<30', 'income': 'low', 'student': 'yes', 'credit_rating': 'fair', 'buys_computer': 'yes'}, {'age': '>40', 'income': 'medium', 'student': 'yes', 'credit_rating': 'fair', 'buys_computer': 'yes'}, {'age': '<30', 'income': 'medium', 'student': 'yes', 'credit_rating': 'excellent', 'buys_computer': 'yes'}, {'age': '30-40', 'income': 'medium', 'student': 'no', 'credit_rating': 'excellent', 'buys_computer': 'yes'}, {'age': '30-40', 'income': 'high', 'student': 'yes', 'credit_rating': 'fair', 'buys_computer': 'yes'}, {'age': '>40', 'income': 'medium', 'student': 'no', 'credit_rating': 'excellent', 'buys_computer': 'no'}, ] # 计算数据集的熵 def calc_entropy(data_set): label_counts = {} for data in data_set: label = data['buys_computer'] if label not in label_counts: label_counts[label] = 0 label_counts[label] += 1 entropy = 0.0 for key in label_counts: prob = float(label_counts[key]) / len(data_set) entropy -= prob * math.log(prob, 2) return entropy # 计算信息增益 def calc_info_gain(data_set, feature): feature_values = set([data[feature] for data in data_set]) new_entropy = 0.0 for value in feature_values: sub_data_set = [data for data in data_set if data[feature] == value] prob = len(sub_data_set) / float(len(data_set)) new_entropy += prob * calc_entropy(sub_data_set) info_gain = calc_entropy(data_set) - new_entropy return info_gain # 输出数据集的熵 print('数据集的熵为：', calc_entropy(people)) # 输出每个特征的信息增益 for feature in people[0].keys(): if feature != 'buys_computer': info_gain = calc_info_gain(people, feature) print('特征', feature, '的信息增益为：', info_gain) ``` 输出结果为： ``` 数据集的熵为： 0.9402859586706311 特征 income 的信息增益为： 0.2467498197744391 特征 age 的信息增益为： 0.029222565658954647 特征 student 的信息增益为： 0.15183550136234136 特征 credit_rating 的信息增益为： 0.04812703040826927 ``` 说明：程序中定义了两个函数：`calc_entropy`和`calc_info_gain`，分别用于计算数据集的熵和计算某个特征的信息增益。在计算熵时，首先统计数据集中每个类别的样本数量，然后根据公式计算熵值。在计算信息增益时，首先根据给定的特征将数据集划分为多个子集，然后计算每个子集的熵，最后根据公式计算信息增益值。程序输出了数据集的熵以及每个特征的信息增益。可以看到，income特征的信息增益最大，因此在决策树中应该选择该特征作为根节点。

阅读全文

利用Python从给定people数据集中设计程序计算数据集的熵和信息增益，并输出

相关推荐

信息熵.zip_信息 信息熵_信息增益_信息熵_信息熵计算_条件熵

熵的计算程序

Python计算信息熵实例

数据分析实战：利用python对心脏病数据集进行分析

《利用python进行数据分析》数据集.rar

毕业设计基于python给定微博数据的反作弊识别源码.zip

计算机毕业设计：Python实现的基于给定微博数据的反作弊识别系统，保证可靠运行，附赠计算机答辩PPT模板

Python数据分析课程设计数据集代码合集.zip

Python程序设计与数据分析

利用Python语句读取json文件，并输出相应数据

【数据可视化】按照给定的数据集，采用Python可视化软件，独立完成一个数据可视

Python程序设计与数据分析课程设计.zip

熵值法matlab代码-IGTS-python:基于信息增益的时间分割

基于python实现的给定微博数据的反作弊识别

利用python进行简单案例数据分析

python实现的网页数据自动录入程序

毕业设计中基于给定微博数据的反作弊识别，用python开发

利用python进行数据分析

重庆理工大学Python程序设计与数据分析课程设计

最新推荐

基于python实现计算两组数据P值

利用python和百度地图API实现数据地图标注的方法

python实现提取COCO,VOC数据集中特定的类

Python数据处理课程设计-房屋价格预测

python 实现对数据集的归一化的方法(0-1之间)

火炬连体网络在MNIST的2D嵌入实现示例

管理建模和仿真的文件

L2正则化的终极指南：从入门到精通，揭秘机器学习中的性能优化技巧

如何构建一个符合GB/T19716和ISO/IEC13335标准的信息安全事件管理框架，并确保业务连续性规划的有效性？

Angular插件增强Application Insights JavaScript SDK功能

信息熵.zip_信息信息熵_信息增益_信息熵_信息熵计算_条件熵