首页鸢尾花数据集后剪枝的python代码

鸢尾花数据集后剪枝的python代码

时间: 2024-03-12 22:36:57 浏览: 197

以下是使用scikit-learn库中的决策树分类器进行后剪枝的鸢尾花数据集的python代码：

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 加载数据集
iris = load_iris()
X = iris.data
y = iris.target

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 训练决策树分类器
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

# 后剪枝
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
clfs = []
for ccp_alpha in ccp_alphas:
    clf = DecisionTreeClassifier(random_state=42, ccp_alpha=ccp_alpha)
    clf.fit(X_train, y_train)
    clfs.append(clf)

# 计算在测试集上的准确率
test_scores = [accuracy_score(y_test, clf.predict(X_test)) for clf in clfs]
best_clf = clfs[test_scores.index(max(test_scores))]
print("Test accuracy of the best pruned tree: {:.2f}%".format(max(test_scores) * 100))

在上面的代码中，首先加载鸢尾花数据集，并将其划分为训练集和测试集。然后使用决策树分类器进行训练，并使用cost_complexity_pruning_path方法计算出一系列ccp_alpha值。接着，对于每个ccp_alpha值，都训练一个新的决策树分类器，最后找出在测试集上表现最好的决策树分类器，并输出其准确率。