Exercise#1: Decision Tree and support vector machine. Goal Implement the decision tree algorithm and the support vector machine, respectively and compare their performances. Dataset (in folder ‘data1’) We will use the Adult dataset whose description is available here. The training set (adult.data) and evaluation set (adult.test) can be downloaded here. The prediction task is to determine whether a person makes over 50K a year. Preprocessing Remove all the records containing '?' (i.e., missing values). Also, remove the attribute "native-country". you can not use pandas， DecisionTreeClassifier and SVC

时间: 2023-06-16 21:08:07 浏览: 160

Okay, let's get started with implementing the decision tree and support vector machine algorithms for the given dataset. First, we need to preprocess the dataset by removing all the records containing '?' and the attribute "native-country". Here's the code for that: ```python def preprocess(filename): data = [] with open(filename, 'r') as f: for line in f: if '?' in line: continue line = line.strip().split(',') del line[14] data.append(line) return data train_data = preprocess('data1/adult.data') test_data = preprocess('data1/adult.test') ``` Now that we have preprocessed the dataset, we can move on to implementing the decision tree algorithm. Here's the code for that: ```python import math def entropy(data): counts = {} for row in data: label = row[-1] if label not in counts: counts[label] = 0 counts[label] += 1 entropy = 0 for label in counts: p = counts[label] / len(data) entropy -= p * math.log2(p) return entropy def split_data(data, attribute): splits = {} for row in data: value = row[attribute] if value not in splits: splits[value] = [] splits[value].append(row) return splits def choose_attribute(data): best_gain = 0 best_attribute = None for attribute in range(len(data[0]) - 1): splits = split_data(data, attribute) entropy_sum = 0 for value in splits: p = len(splits[value]) / len(data) entropy_sum += p * entropy(splits[value]) gain = entropy(data) - entropy_sum if gain > best_gain: best_gain = gain best_attribute = attribute return best_attribute def majority_label(data): counts = {} for row in data: label = row[-1] if label not in counts: counts[label] = 0 counts[label] += 1 majority_label = None majority_count = 0 for label in counts: if counts[label] > majority_count: majority_label = label majority_count = counts[label] return majority_label def decision_tree(data): if len(data) == 0: return None if len(set(row[-1] for row in data)) == 1: return data[0][-1] attribute = choose_attribute(data) if attribute is None: return majority_label(data) tree = {attribute: {}} splits = split_data(data, attribute) for value in splits: tree[attribute][value] = decision_tree(splits[value]) return tree ``` Now that we have implemented the decision tree algorithm, we can move on to implementing the support vector machine algorithm. Here's the code for that: ```python import random def dot_product(x, y): return sum(xi * yi for xi, yi in zip(x, y)) def svm_train(data, epochs, learning_rate): w = [0] * len(data[0][:-1]) b = 0 for epoch in range(epochs): random.shuffle(data) for row in data: x = row[:-1] y = row[-1] if y * (dot_product(w, x) + b) <= 1: w = [wi + learning_rate * (y * xi - 2 * 0.1 * wi) for wi, xi in zip(w, x)] b += learning_rate * y else: w = [(1 - 2 * 0.1 * learning_rate) * wi for wi in w] return w, b def svm_predict(data, w, b): predictions = [] for row in data: x = row[:-1] y = row[-1] if dot_product(w, x) + b > 0: predictions.append(1) else: predictions.append(-1) return predictions ``` Finally, we can compare the performances of the decision tree and support vector machine algorithms on the given dataset. Here's the code for that: ```python train_data = [(list(map(float, row[:-1])), 1 if row[-1] == '>50K' else -1) for row in train_data] test_data = [(list(map(float, row[:-1])), 1 if row[-1] == '>50K.' else -1) for row in test_data] tree = decision_tree(train_data) tree_predictions = [evaluate(tree, row[:-1]) for row in test_data] tree_accuracy = sum(1 for prediction, actual in zip(tree_predictions, test_data) if prediction == actual[-1]) / len(test_data) svm_w, svm_b = svm_train(train_data, epochs=100, learning_rate=0.01) svm_predictions = svm_predict(test_data, svm_w, svm_b) svm_accuracy = sum(1 for prediction, actual in zip(svm_predictions, test_data) if prediction == actual[-1]) / len(test_data) print('Decision Tree Accuracy:', tree_accuracy) print('Support Vector Machine Accuracy:', svm_accuracy) ``` This should give us the accuracy of the decision tree and support vector machine algorithms on the given dataset.

阅读全文

相关推荐

决策树 Decision Tree1

learning decision tree algorithm

预测模型 决策树 支持向量机

8D方法论：一种解决问题的方略.pptx

数字信号处理教学课件：Chapter9 IIR Digital Filter Design.ppt

扩展矩阵leetcode-leetcode:http://www.lintcode.com/en/

the_implement_of_java_tree.rar_The Tree_tree 数据库_tree java_数据

MySQL.and.Perl.for.the.Web

YOLOv4_tensorflow:implement yolov4 with pure tensorflow (tf1) and you can training net on your own dataset. YOLOv4的纯tensorflow实现,可训练自己的模型

TensorFlow.Machine.Learning.Cookbook.epub

R.Unleash.Machine.Learning.Techniques

Practical.Machine.Learning.178439968X

Apress.Beginning.Machine.Learning.in.iOS.CoreML.Framework

大家在看

计算机控制实验74HC4051的使用

软件工程-总体设计概述(ppt-113页).ppt

多文档应用程序MDI-vc++、MFC基础教程

中国移动5G规模试验测试规范--核心网领域--SA基础网元性能测试分册.pdf

CAN分析仪 解析 DBC uds 源码

最新推荐

免费的防止锁屏小软件，可用于域统一管控下的锁屏机制

Python代码实现带装饰的圣诞树控制台输出

白色大气风格的设计师作品模板下载.zip

RStudio中集成Connections包以优化数据库连接管理

管理建模和仿真的文件

Keil uVision5全面精通指南

flink提交给yarn19个全量同步MYsqlCDC的作业，flink的配置参数怎样设置

PHP博客旅游的探索之旅

"互动学习：行动中的多样性与论文攻读经历"

【单片机编程实战】：掌握流水灯与音乐盒同步控制的高级技巧

预测模型决策树支持向量机

CAN分析仪解析 DBC uds 源码