给我一个决策树机器学习的实例代码,已有训练集和测试集,但是训练集多了一个label,有两列label,并且两列label都叫同一个名字
时间: 2024-03-03 17:52:53 浏览: 118
好的,这里给您提供一个Python的决策树机器学习实例代码,可以处理两列同名的标签。
首先,我们需要导入需要的库:
```python
import pandas as pd
from sklearn import tree
from sklearn.metrics import accuracy_score
```
然后,读入训练集和测试集数据,假设它们分别保存在train.csv和test.csv文件中:
```python
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')
```
接下来,我们需要对训练集和测试集数据进行一些预处理,将两列同名的label进行区分。我们可以将其中一列label重命名为label_2,代码如下:
```python
train_data.rename(columns={'label': 'label_1', 'label': 'label_2'}, inplace=True)
test_data.rename(columns={'label': 'label_1', 'label': 'label_2'}, inplace=True)
```
然后,我们需要将标签列从特征列中分离出来,将它们保存为单独的变量:
```python
train_labels_1 = train_data.pop('label_1')
train_labels_2 = train_data.pop('label_2')
test_labels_1 = test_data.pop('label_1')
test_labels_2 = test_data.pop('label_2')
```
接下来,我们可以使用sklearn库中的DecisionTreeClassifier类来训练一个决策树模型:
```python
clf = tree.DecisionTreeClassifier()
clf.fit(train_data, train_labels_1)
```
最后,我们可以使用测试集数据来评估模型的准确度:
```python
predictions_1 = clf.predict(test_data)
accuracy_1 = accuracy_score(test_labels_1, predictions_1)
print("Accuracy for label_1: ", accuracy_1)
clf.fit(train_data, train_labels_2)
predictions_2 = clf.predict(test_data)
accuracy_2 = accuracy_score(test_labels_2, predictions_2)
print("Accuracy for label_2: ", accuracy_2)
```
完整代码如下:
```python
import pandas as pd
from sklearn import tree
from sklearn.metrics import accuracy_score
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')
train_data.rename(columns={'label': 'label_1', 'label': 'label_2'}, inplace=True)
test_data.rename(columns={'label': 'label_1', 'label': 'label_2'}, inplace=True)
train_labels_1 = train_data.pop('label_1')
train_labels_2 = train_data.pop('label_2')
test_labels_1 = test_data.pop('label_1')
test_labels_2 = test_data.pop('label_2')
clf = tree.DecisionTreeClassifier()
clf.fit(train_data, train_labels_1)
predictions_1 = clf.predict(test_data)
accuracy_1 = accuracy_score(test_labels_1, predictions_1)
print("Accuracy for label_1: ", accuracy_1)
clf.fit(train_data, train_labels_2)
predictions_2 = clf.predict(test_data)
accuracy_2 = accuracy_score(test_labels_2, predictions_2)
print("Accuracy for label_2: ", accuracy_2)
```
阅读全文