python使用默认参数训练决策树分类器
时间: 2023-10-24 08:18:42 浏览: 101
基于决策树的n则交叉验证分类器
我们可以使用sklearn库中的DecisionTreeClassifier类来训练决策树分类器,并使用其中的参数来设置默认值。具体实现步骤如下:
1. 导入所需模块和数据集
```python
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
# 加载iris数据集
iris = load_iris()
X, y = iris.data, iris.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
2. 定义模型并设置默认参数
```python
# 定义决策树分类器
dtc = DecisionTreeClassifier(criterion='gini', splitter='best', max_depth=None, min_samples_split=2,
min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None,
random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, class_weight=None, ccp_alpha=0.0)
# 训练模型
dtc.fit(X_train, y_train)
```
在上述代码中,我们使用DecisionTreeClassifier类来定义分类器,并设置了一系列参数的默认值,例如criterion为'gini'表示使用基尼系数来进行特征选择,splitter为'best'表示选择最佳切分点,max_depth为None表示树的深度不限制等等。
3. 模型评估
```python
# 测试模型准确率
accuracy = dtc.score(X_test, y_test)
print('Accuracy:', accuracy)
```
我们可以使用score方法来计算模型在测试集上的准确率。
完整代码如下:
```python
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
# 加载iris数据集
iris = load_iris()
X, y = iris.data, iris.target
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 定义决策树分类器
dtc = DecisionTreeClassifier(criterion='gini', splitter='best', max_depth=None, min_samples_split=2,
min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None,
random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, class_weight=None, ccp_alpha=0.0)
# 训练模型
dtc.fit(X_train, y_train)
# 测试模型准确率
accuracy = dtc.score(X_test, y_test)
print('Accuracy:', accuracy)
```
输出结果为:
```
Accuracy: 1.0
```
阅读全文