决策树分类预测python代码?
时间: 2023-07-05 15:11:21 浏览: 108
以下是使用Python中的scikit-learn库实现决策树分类预测的示例代码:
```
# 导入需要的库
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# 加载数据集
iris = load_iris()
X = iris.data
y = iris.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 创建决策树分类器
clf = DecisionTreeClassifier()
# 训练模型
clf.fit(X_train, y_train)
# 预测测试集结果
y_pred = clf.predict(X_test)
# 计算准确率
score = clf.score(X_test, y_test)
print("准确率:", score)
```
代码中首先导入需要的库,然后加载iris数据集,并将数据集划分为训练集和测试集。接着,创建决策树分类器,并使用训练数据对其进行训练。最后,使用测试数据进行预测,并计算模型的准确率。需要注意的是,在实际应用中,还需要对模型进行调参以优化模型性能。
相关问题
简单数据挖掘ID3决策树分类预测python代码
以下是一个简单的基于ID3算法的决策树分类预测的Python代码:
```python
import pandas as pd
import numpy as np
import math
# 定义计算熵的函数
def calc_entropy(data):
label_col = data.iloc[:, -1]
_, counts = np.unique(label_col, return_counts=True)
probs = counts / len(label_col)
entropy = sum(probs * -np.log2(probs))
return entropy
# 定义计算信息增益的函数
def calc_info_gain(data, feature):
entropy_before_split = calc_entropy(data)
vals, counts = np.unique(data[feature], return_counts=True)
probs = counts / sum(counts)
entropy_after_split = 0
for i in range(len(vals)):
sub_data = data[data[feature] == vals[i]]
entropy_after_split += probs[i] * calc_entropy(sub_data)
info_gain = entropy_before_split - entropy_after_split
return info_gain
# 定义获取最佳切分特征的函数
def get_best_split_feature(data):
features = data.columns[:-1]
best_feature = None
best_info_gain = -1
for feature in features:
info_gain = calc_info_gain(data, feature)
if info_gain > best_info_gain:
best_info_gain = info_gain
best_feature = feature
return best_feature
# 定义决策树训练函数
def train_decision_tree(data):
# 终止条件1:如果数据集中所有样本都属于同一类别,直接返回该类别
if len(np.unique(data.iloc[:, -1])) == 1:
return np.unique(data.iloc[:, -1])[0]
# 终止条件2:如果数据集中没有特征可供切分,直接返回样本数最多的类别
if len(data.columns) == 1:
return np.bincount(data.iloc[:, -1]).argmax()
# 选择最佳切分特征
best_feature = get_best_split_feature(data)
# 构建决策树
decision_tree = {best_feature: {}}
vals, counts = np.unique(data[best_feature], return_counts=True)
for i in range(len(vals)):
# 去掉最佳切分特征,递归构建子树
sub_data = data[data[best_feature] == vals[i]].drop(best_feature, axis=1)
decision_tree[best_feature][vals[i]] = train_decision_tree(sub_data)
return decision_tree
# 定义决策树预测函数
def predict(data, tree):
for feature, subtree in tree.items():
val = data[feature]
if val not in subtree:
return np.bincount(data.iloc[:, -1]).argmax()
subtree = subtree[val]
if isinstance(subtree, dict):
return predict(data.drop(feature, axis=1), subtree)
else:
return subtree
# 加载数据集
data = pd.read_csv('data.csv')
# 训练决策树
decision_tree = train_decision_tree(data)
# 预测新样本
new_data = pd.DataFrame({'feature1': [1, 1, 0, 0], 'feature2': [1, 0, 1, 0]})
for i in range(len(new_data)):
prediction = predict(new_data.iloc[i], decision_tree)
print('Sample', i+1, 'predicted class:', prediction)
```
注:此代码仅作为学习参考,实际应用中需要考虑更多的细节和优化措施。
鸢尾花决策树分类的Python代码
在Python中,我们可以使用Scikit-learn库来构建鸢尾花数据集的决策树分类模型。这是一个简单的示例,我们将展示如何使用`Iris`数据集并创建一个决策树分类器:
```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# 加载鸢尾花数据
iris = load_iris()
X = iris.data
y = iris.target
# 数据预处理:分割训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# 创建决策树分类器
clf = DecisionTreeClassifier()
# 训练模型
clf.fit(X_train, y_train)
# 预测
y_pred = clf.predict(X_test)
# 计算准确率
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
# 演示决策树结构
import graphviz
dot_data = tree.export_graphviz(clf, out_file=None, feature_names=iris.feature_names, class_names=iris.target_names, filled=True, rounded=True)
graphviz.Source(dot_data).view()
```
在这个例子中,我们首先导入所需的库,然后加载鸢尾花数据。接着将数据分为训练集和测试集,创建一个`DecisionTreeClassifier`实例,训练它,并预测测试集的结果。最后计算预测的准确率,并可视化工作流程。
阅读全文
相关推荐
















