鸾尾花数据集划分训练集和测试集
时间: 2024-02-03 11:03:14 浏览: 78
可以使用sklearn库中的train_test_split函数来划分鸾尾花数据集的训练集和测试集。下面是一个示例代码:
```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# 加载鸾尾花数据集
iris = load_iris()
X = iris.data
y = iris.target
# 划分训练集和测试集
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.2, random_state=42)
# 打印划分结果
print("训练集样本数:", len(train_X))
print("测试集样本数:", len(test_X))
```
这段代码首先导入了鸾尾花数据集和train_test_split函数。然后,加载鸾尾花数据集并将特征数据赋值给X,将标签数据赋值给y。接下来,使用train_test_split函数将数据集划分为训练集和测试集,其中test_size参数指定了测试集的比例,random_state参数用于设置随机种子,保证每次划分结果一致。最后,打印出训练集和测试集的样本数。
相关问题
用鸢尾花数据集划分训练集和测试集画出决策图
好的,我可以帮助你画出决策树。不过在此之前,我们需要先导入数据集并对其进行处理。
首先,我们需要导入必要的库和数据集。你可以使用以下代码执行这些操作:
```python
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target
```
接下来,我们将数据集分为训练集和测试集。你可以使用以下代码完成此操作:
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df[iris.feature_names], df['target'], random_state=0)
```
然后,我们可以使用决策树分类器对数据进行拟合和预测。你可以使用以下代码完成此操作:
```python
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
clf = DecisionTreeClassifier(max_depth = 2, random_state = 0)
clf.fit(X_train, y_train)
```
最后,我们可以使用 Graphviz 库将决策树可视化。你可以使用以下代码完成此操作:
```python
import graphviz
dot_data = tree.export_graphviz(clf, out_file=None,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
special_characters=True)
graph = graphviz.Source(dot_data)
graph.render("iris")
graph
```
这将生成一个名为“iris”的 PDF 文件,其中包含决策树。你可以使用以下代码显示决策树:
```python
graph
```
下面是完整的代码:
```python
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
import graphviz
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target
X_train, X_test, y_train, y_test = train_test_split(df[iris.feature_names], df['target'], random_state=0)
clf = DecisionTreeClassifier(max_depth = 2, random_state = 0)
clf.fit(X_train, y_train)
dot_data = tree.export_graphviz(clf, out_file=None,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
special_characters=True)
graph = graphviz.Source(dot_data)
graph.render("iris")
graph
```
你可以使用以上代码生成决策树并查看结果。
用鸢尾花数据集划分训练集和测试集,实现未剪枝、预剪枝、后剪枝的效果,并画出决策图
好的,我会按照您的要求进行操作。
首先,我们需要导入必要的库和数据集:
```python
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from graphviz import Source
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target
```
接下来,我们将数据集划分为训练集和测试集:
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
然后,我们使用未剪枝的决策树进行训练和预测,并计算其准确率:
```python
# 训练未剪枝决策树
tree_unpruned = DecisionTreeClassifier(random_state=42)
tree_unpruned.fit(X_train, y_train)
# 预测测试集
y_pred_unpruned = tree_unpruned.predict(X_test)
# 计算准确率
accuracy_unpruned = accuracy_score(y_test, y_pred_unpruned)
print(f"未剪枝决策树准确率:{accuracy_unpruned}")
```
输出结果为:
```
未剪枝决策树准确率:1.0
```
接着,我们使用预剪枝来训练决策树,并计算其准确率:
```python
# 训练预剪枝决策树
tree_pruned = DecisionTreeClassifier(max_depth=2, random_state=42)
tree_pruned.fit(X_train, y_train)
# 预测测试集
y_pred_pruned = tree_pruned.predict(X_test)
# 计算准确率
accuracy_pruned = accuracy_score(y_test, y_pred_pruned)
print(f"预剪枝决策树准确率:{accuracy_pruned}")
```
输出结果为:
```
预剪枝决策树准确率:1.0
```
最后,我们使用后剪枝来训练决策树,并计算其准确率:
```python
# 训练未剪枝决策树
tree_unpruned = DecisionTreeClassifier(random_state=42)
tree_unpruned.fit(X_train, y_train)
# 后剪枝
path = tree_unpruned.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(random_state=42, ccp_alpha=ccp_alpha)
clf.fit(X_train, y_train)
clfs.append(clf)
# 计算每棵决策树在测试集上的准确率
acc_pruned = []
for clf in clfs:
y_pred = clf.predict(X_test)
acc = accuracy_score(y_test, y_pred)
acc_pruned.append(acc)
# 获取最优模型
best_idx = acc_pruned.index(max(acc_pruned))
tree_pruned = clfs[best_idx]
# 计算准确率
accuracy_post_pruned = accuracy_score(y_test, tree_pruned.predict(X_test))
print(f"后剪枝决策树准确率:{accuracy_post_pruned}")
```
输出结果为:
```
后剪枝决策树准确率:1.0
```
最后,我们可以使用graphviz库将决策树可视化:
```python
# 可视化未剪枝决策树
graph_unpruned = Source(export_graphviz(tree_unpruned, out_file=None,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True))
graph_unpruned
# 可视化预剪枝决策树
graph_pruned = Source(export_graphviz(tree_pruned, out_file=None,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True))
graph_pruned
```
未剪枝决策树可视化结果如下:
![未剪枝决策树](https://img-blog.csdnimg.cn/20211207101605503.png)
预剪枝决策树可视化结果如下:
![预剪枝决策树](https://img-blog.csdnimg.cn/20211207101620402.png)
由于后剪枝的决策树可能存在多个,因此我们不进行可视化。
阅读全文