python代码利用ID3算法实现对葡萄酒数据集的分类

时间: 2023-08-15 17:04:22 浏览: 303

【机器学习】python构建ID3决策树+pygraphviz可视化（代码详解，附注释）

5星 · 资源好评率100%

好的，以下是利用ID3算法实现对葡萄酒数据集的分类的Python代码： ```python import pandas as pd import numpy as np from math import log2 # 计算信息熵 def calc_entropy(data): n = len(data) label_counts = {} for feat_vec in data: current_label = feat_vec[-1] if current_label not in label_counts.keys(): label_counts[current_label] = 0 label_counts[current_label] += 1 entropy = 0.0 for key in label_counts: prob = float(label_counts[key]) / n entropy += -prob * log2(prob) return entropy # 划分数据集 def split_data(data, axis, value): ret_data = [] for feat_vec in data: if feat_vec[axis] == value: reduced_feat_vec = feat_vec[:axis] reduced_feat_vec.extend(feat_vec[axis+1:]) ret_data.append(reduced_feat_vec) return ret_data # 选择最佳划分特征 def choose_best_feature_to_split(data): num_features = len(data[0]) - 1 base_entropy = calc_entropy(data) best_info_gain = 0.0 best_feature = -1 for i in range(num_features): feat_list = [example[i] for example in data] unique_vals = set(feat_list) new_entropy = 0.0 for value in unique_vals: sub_data = split_data(data, i, value) prob = len(sub_data) / float(len(data)) new_entropy += prob * calc_entropy(sub_data) info_gain = base_entropy - new_entropy if info_gain > best_info_gain: best_info_gain = info_gain best_feature = i return best_feature # 定义叶子节点 def majority_cnt(class_list): class_count = {} for vote in class_list: if vote not in class_count.keys(): class_count[vote] = 0 class_count[vote] += 1 sorted_class_count = sorted(class_count.items(), key=lambda x: x[1], reverse=True) return sorted_class_count[0][0] # 创建决策树 def create_tree(data, labels): class_list = [example[-1] for example in data] if class_list.count(class_list[0]) == len(class_list): return class_list[0] if len(data[0]) == 1: return majority_cnt(class_list) best_feat = choose_best_feature_to_split(data) best_feat_label = labels[best_feat] my_tree = {best_feat_label: {}} del(labels[best_feat]) feat_values = [example[best_feat] for example in data] unique_vals = set(feat_values) for value in unique_vals: sub_labels = labels[:] my_tree[best_feat_label][value] = create_tree(split_data(data, best_feat, value), sub_labels) return my_tree # 测试决策树 def classify(input_tree, feat_labels, test_vec): first_str = list(input_tree.keys())[0] second_dict = input_tree[first_str] feat_index = feat_labels.index(first_str) for key in second_dict.keys(): if test_vec[feat_index] == key: if type(second_dict[key]).__name__ == 'dict': class_label = classify(second_dict[key], feat_labels, test_vec) else: class_label = second_dict[key] return class_label # 加载数据集 def load_dataset(): df = pd.read_csv('wine.csv') data = np.array(df) labels = df.columns.tolist() return data, labels # 主函数 if __name__ == '__main__': data, labels = load_dataset() my_tree = create_tree(data.tolist(), labels) print(my_tree) test_vec = [1, 13.05, 3.86, 2.32, 22.5, 85, 1.65, 1.59, 0.61, 2.87, 580] class_label = classify(my_tree, labels, test_vec) print(class_label) ``` 在这个例子中，我们使用了葡萄酒数据集。数据集包含13个特征和1个类别标签，我们使用ID3算法构建决策树，并对一个测试样本进行分类。

阅读全文

python代码利用ID3算法实现对葡萄酒数据集的分类

相关推荐

Python实现SVM算法对鸢尾花数据集进行分类分析

Python实现ID3决策树算法

Supervised_classification_DT_GaussianNB：使用决策树和高斯朴素贝叶斯对葡萄酒数据集进行分类

id3_ID3算法_源码

用机器学习算法对UCI上的三个数据集做预测

python数据分析之决策树实践

机器学习分类算法实验报告.docx

决策树算法与信息增益比在数据挖掘中的应用

Vega与Vega-Lite在Python中的应用：声明式可视化语法

相关性分析在数据挖掘中的重要作用

Python实现ID3决策树算法预测模型

Python实现KNN算法详解：分类基础与代码演示

(175797816)华南理工大学信号与系统Signal and Systems期末考试试卷及答案

深圳建设施工项目安全生产奖惩管理制度.docx

离散数学课后题答案+sdut往年试卷+复习提纲资料

自考04741计算机网络原理真题及答案及课件

C++实现rpc，全程手写

前端拿到的列表数据里id都一样的处理办法.txt

最新推荐

Python3 A*寻路算法实现方式

基于ID3决策树算法的实现(Python版)

Python使用sklearn库实现的各种分类算法简单应用小结

基于python实现KNN分类算法

Python实现简单层次聚类算法以及可视化

Java毕业设计项目：校园二手交易网站开发指南

管理建模和仿真的文件

【MVC标准化：肌电信号处理的终极指南】：提升数据质量的10大关键步骤与工具

能否提供一个在R语言中执行Framingham数据集判别分析的详细和完整的代码示例？

Blaseball Plus插件开发与构建教程