基于相亲实验数据,使用sklearn.tree包中决策树算法来绘制决策树,要求如下: 1)criterion参数指定是基尼指数作为样本集分裂的依据绘制决策树,预测验证集数据val_sample = [[24, 178, 2, 17000],[27, 176, 0, 25000],[27, 176, 0, 10000],[32, 170, 2, 42000]]对应的相亲情况;
时间: 2023-07-02 19:09:56 浏览: 147
首先,我们需要读取相亲实验数据,并将其分为训练集和验证集:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
# 读取数据集
df = pd.read_csv('speed_dating.csv', encoding='ISO-8859-1')
# 选择特征
X = df[['age', 'height', 'race', 'income']]
y = df['match']
# 划分训练集和验证集
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
```
接着,我们可以使用决策树算法来训练模型,并绘制决策树:
```python
from sklearn.tree import DecisionTreeClassifier, export_graphviz
import graphviz
# 创建决策树模型
dtc = DecisionTreeClassifier(criterion='gini', random_state=42)
# 训练模型
dtc.fit(X_train, y_train)
# 绘制决策树
dot_data = export_graphviz(dtc, out_file=None, feature_names=X.columns, class_names=['0', '1'], filled=True, rounded=True, special_characters=True)
graph = graphviz.Source(dot_data)
graph
```
最后,我们可以使用训练好的模型来预测验证集数据,并输出预测结果:
```python
# 预测验证集数据
val_sample = [[24, 178, 2, 17000],[27, 176, 0, 25000],[27, 176, 0, 10000],[32, 170, 2, 42000]]
y_pred = dtc.predict(val_sample)
# 输出预测结果
for i in range(len(val_sample)):
print(f"预测{val_sample[i]}的结果为{y_pred[i]}")
```
完整代码如下:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, export_graphviz
import graphviz
# 读取数据集
df = pd.read_csv('speed_dating.csv', encoding='ISO-8859-1')
# 选择特征
X = df[['age', 'height', 'race', 'income']]
y = df['match']
# 划分训练集和验证集
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
# 创建决策树模型
dtc = DecisionTreeClassifier(criterion='gini', random_state=42)
# 训练模型
dtc.fit(X_train, y_train)
# 绘制决策树
dot_data = export_graphviz(dtc, out_file=None, feature_names=X.columns, class_names=['0', '1'], filled=True, rounded=True, special_characters=True)
graph = graphviz.Source(dot_data)
graph
# 预测验证集数据
val_sample = [[24, 178, 2, 17000],[27, 176, 0, 25000],[27, 176, 0, 10000],[32, 170, 2, 42000]]
y_pred = dtc.predict(val_sample)
# 输出预测结果
for i in range(len(val_sample)):
print(f"预测{val_sample[i]}的结果为{y_pred[i]}")
```
阅读全文