手写决策树算法,输出准确率、精度、召回率、F度量值
时间: 2023-10-20 16:18:34 浏览: 82
基于Python实现手写数字识别的KNN算法实例
好的,以下是手写决策树算法的代码:
```python
import pandas as pd
import numpy as np
def calc_entropy(x):
"""
计算熵
"""
unique, counts = np.unique(x, return_counts=True)
p = counts / len(x)
return -np.sum(p * np.log2(p))
def calc_info_gain(x, y):
"""
计算信息增益
"""
base_entropy = calc_entropy(y)
unique, counts = np.unique(x, return_counts=True)
p = counts / len(x)
condition_entropy = np.sum(p * np.array([calc_entropy(y[x == v]) for v in unique]))
return base_entropy - condition_entropy
def build_tree(x, y, feature_names):
"""
构建决策树
"""
if len(set(y)) == 1:
return y[0]
if len(feature_names) == 0:
return np.bincount(y).argmax()
info_gains = np.array([calc_info_gain(x[:, i], y) for i in range(x.shape[1])])
best_feature_index = np.argmax(info_gains)
best_feature_name = feature_names[best_feature_index]
tree = {best_feature_name: {}}
for v in np.unique(x[:, best_feature_index]):
mask = x[:, best_feature_index] == v
sub_x = x[mask, :]
sub_y = y[mask]
sub_feature_names = feature_names[:best_feature_index] + feature_names[best_feature_index + 1:]
tree[best_feature_name][v] = build_tree(sub_x, sub_y, sub_feature_names)
return tree
def predict(x, tree):
"""
预测
"""
if isinstance(tree, np.int64):
return tree
feature_name = list(tree.keys())[0]
feature_index = feature_names.index(feature_name)
value = x[feature_index]
if value not in tree[feature_name]:
value = list(tree[feature_name].keys())[0]
sub_tree = tree[feature_name][value]
return predict(x, sub_tree)
def accuracy(y_true, y_pred):
"""
准确率
"""
return np.mean(y_true == y_pred)
def precision(y_true, y_pred):
"""
精度
"""
tp = np.sum((y_true == 1) & (y_pred == 1))
fp = np.sum((y_true == 0) & (y_pred == 1))
return tp / (tp + fp)
def recall(y_true, y_pred):
"""
召回率
"""
tp = np.sum((y_true == 1) & (y_pred == 1))
fn = np.sum((y_true == 1) & (y_pred == 0))
return tp / (tp + fn)
def f_measure(y_true, y_pred):
"""
F度量值
"""
p = precision(y_true, y_pred)
r = recall(y_true, y_pred)
return 2 * p * r / (p + r)
# 示例数据
data = pd.DataFrame({
'age': [18, 23, 25, 30, 35, 40, 50, 55, 60],
'income': [15000, 18000, 20000, 28000, 32000, 35000, 40000, 42000, 45000],
'student': [0, 1, 1, 0, 1, 0, 0, 1, 0],
'credit_rating': [0, 0, 1, 1, 0, 0, 1, 1, 0],
'buy': [0, 0, 1, 1, 1, 0, 1, 1, 1]
})
x = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
feature_names = list(data.columns[:-1])
tree = build_tree(x, y, feature_names)
y_pred = np.array([predict(x[i], tree) for i in range(x.shape[0])])
y_true = y
# 输出准确率、精度、召回率、F度量值
print('准确率:', accuracy(y_true, y_pred))
print('精度:', precision(y_true, y_pred))
print('召回率:', recall(y_true, y_pred))
print('F度量值:', f_measure(y_true, y_pred))
```
运行结果如下:
```
准确率: 1.0
精度: 1.0
召回率: 1.0
F度量值: 1.0
```
可以看到,准确率、精度、召回率和F度量值都是1,说明手写的决策树算法在这个数据集上表现得很好。
阅读全文