给我一个完整的决策树进行情感三分类的代码
时间: 2024-01-02 14:03:28 浏览: 68
下面是一个使用决策树进行情感三分类的完整代码:
```
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, roc_curve, auc
# 读取评论数据
data = pd.read_excel('筛选_1.xlsx')
# 分离评论内容和情感类别
X = data['content']
y = data['label1']
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# 特征提取
vectorizer = TfidfVectorizer()
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)
# 训练决策树模型
dt = DecisionTreeClassifier()
dt.fit(X_train_vectorized, y_train)
# 在测试集上进行预测
y_pred = dt.predict(X_test_vectorized)
# 输出分类报告、准确率、召回率和 F1 值
report = classification_report(y_test, y_pred)
print(report)
accuracy = dt.score(X_test_vectorized, y_test)
precision, recall, f1, _ = classification_report(y_test, y_pred, output_dict=True)['weighted avg']
print("准确率: ", accuracy)
print("召回率: ", recall)
print("F1 值: ", f1)
# 画ROC曲线
fpr, tpr, thresholds = roc_curve(y_test, dt.predict_proba(X_test_vectorized)[:,1], pos_label=2)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, color='darkorange', label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic')
plt.legend(loc="lower right")
plt.show()
```
这个代码中使用了 `DecisionTreeClassifier` 来训练决策树模型,并在测试集上进行预测。同时,还画出了 ROC 曲线。请注意,计算 ROC 曲线时需要指定正例标签为 `2`,因为情感类别是从 `1` 开始编号,而 `DecisionTreeClassifier` 默认将最小的标签值作为正例标签。
阅读全文