决策树分类预测python代码?
时间: 2023-07-05 17:11:21 浏览: 48
以下是使用Python中的scikit-learn库实现决策树分类预测的示例代码:
```
# 导入需要的库
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# 加载数据集
iris = load_iris()
X = iris.data
y = iris.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 创建决策树分类器
clf = DecisionTreeClassifier()
# 训练模型
clf.fit(X_train, y_train)
# 预测测试集结果
y_pred = clf.predict(X_test)
# 计算准确率
score = clf.score(X_test, y_test)
print("准确率:", score)
```
代码中首先导入需要的库,然后加载iris数据集,并将数据集划分为训练集和测试集。接着,创建决策树分类器,并使用训练数据对其进行训练。最后,使用测试数据进行预测,并计算模型的准确率。需要注意的是,在实际应用中,还需要对模型进行调参以优化模型性能。
相关问题
简单数据挖掘ID3决策树分类预测python代码
以下是一个简单的基于ID3算法的决策树分类预测的Python代码:
```python
import pandas as pd
import numpy as np
import math
# 定义计算熵的函数
def calc_entropy(data):
label_col = data.iloc[:, -1]
_, counts = np.unique(label_col, return_counts=True)
probs = counts / len(label_col)
entropy = sum(probs * -np.log2(probs))
return entropy
# 定义计算信息增益的函数
def calc_info_gain(data, feature):
entropy_before_split = calc_entropy(data)
vals, counts = np.unique(data[feature], return_counts=True)
probs = counts / sum(counts)
entropy_after_split = 0
for i in range(len(vals)):
sub_data = data[data[feature] == vals[i]]
entropy_after_split += probs[i] * calc_entropy(sub_data)
info_gain = entropy_before_split - entropy_after_split
return info_gain
# 定义获取最佳切分特征的函数
def get_best_split_feature(data):
features = data.columns[:-1]
best_feature = None
best_info_gain = -1
for feature in features:
info_gain = calc_info_gain(data, feature)
if info_gain > best_info_gain:
best_info_gain = info_gain
best_feature = feature
return best_feature
# 定义决策树训练函数
def train_decision_tree(data):
# 终止条件1:如果数据集中所有样本都属于同一类别,直接返回该类别
if len(np.unique(data.iloc[:, -1])) == 1:
return np.unique(data.iloc[:, -1])[0]
# 终止条件2:如果数据集中没有特征可供切分,直接返回样本数最多的类别
if len(data.columns) == 1:
return np.bincount(data.iloc[:, -1]).argmax()
# 选择最佳切分特征
best_feature = get_best_split_feature(data)
# 构建决策树
decision_tree = {best_feature: {}}
vals, counts = np.unique(data[best_feature], return_counts=True)
for i in range(len(vals)):
# 去掉最佳切分特征,递归构建子树
sub_data = data[data[best_feature] == vals[i]].drop(best_feature, axis=1)
decision_tree[best_feature][vals[i]] = train_decision_tree(sub_data)
return decision_tree
# 定义决策树预测函数
def predict(data, tree):
for feature, subtree in tree.items():
val = data[feature]
if val not in subtree:
return np.bincount(data.iloc[:, -1]).argmax()
subtree = subtree[val]
if isinstance(subtree, dict):
return predict(data.drop(feature, axis=1), subtree)
else:
return subtree
# 加载数据集
data = pd.read_csv('data.csv')
# 训练决策树
decision_tree = train_decision_tree(data)
# 预测新样本
new_data = pd.DataFrame({'feature1': [1, 1, 0, 0], 'feature2': [1, 0, 1, 0]})
for i in range(len(new_data)):
prediction = predict(new_data.iloc[i], decision_tree)
print('Sample', i+1, 'predicted class:', prediction)
```
注:此代码仅作为学习参考,实际应用中需要考虑更多的细节和优化措施。
雇员数据库的简单数据挖掘ID3决策树分类预测python代码
以下是使用Python实现基于ID3决策树算法进行分类预测的示例代码:
```python
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# 加载数据
data = pd.read_csv("employees.csv")
# 数据预处理
X = data.drop(['left'], axis=1)
y = data['left']
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# 构建决策树模型
dtc = DecisionTreeClassifier()
dtc.fit(X_train, y_train)
# 预测
y_pred = dtc.predict(X_test)
# 计算准确率
acc = accuracy_score(y_test, y_pred)
print("Accuracy:", acc)
```
其中,`employees.csv`是包含所有雇员信息的CSV文件,`left`是标记是否离职的列。我们首先使用`pandas`库加载数据,并进行数据预处理,然后使用`train_test_split`函数将数据划分为训练集和测试集。接着,我们使用`DecisionTreeClassifier`类构建决策树模型,并使用`fit`方法对模型进行训练。最后,我们使用`predict`方法对测试集进行预测,并使用`accuracy_score`函数计算模型的准确率。
需要注意的是,上述代码只是一个简单示例,实际应用中需要根据具体的业务场景进行调整和优化。