对海底捞的评价均分、口味、服务、环境和评论五个方面进行文本分析和评论建模,代码
时间: 2023-12-25 10:02:47 浏览: 96
以下是对海底捞的评价均分、口味、服务、环境和评论五个方面进行文本分析和评论建模的代码:
```
import pandas as pd
import jieba
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.model_selection import train_test_split
# 加载数据集
df = pd.read_csv('data.csv', encoding='utf-8')
# 分词
df['分词'] = df['评论'].apply(lambda x: " ".join(jieba.cut(x)))
# 特征工程
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['分词'])
y1 = df['评价均分']
y2 = df['口味']
y3 = df['服务']
y4 = df['环境']
# 划分训练集和测试集
X_train, X_test, y1_train, y1_test = train_test_split(X, y1, test_size=0.3, random_state=42)
_, X_val, _, y1_val = train_test_split(X_train, y1_train, test_size=0.5, random_state=42)
# 建立评价均分模型
clf1 = MultinomialNB()
clf1.fit(X_train, y1_train)
y1_pred = clf1.predict(X_test)
print('评价均分准确率:', accuracy_score(y1_test, y1_pred))
print('混淆矩阵:\n', confusion_matrix(y1_test, y1_pred))
print('分类报告:\n', classification_report(y1_test, y1_pred))
# 划分训练集和测试集
X_train, X_test, y2_train, y2_test = train_test_split(X, y2, test_size=0.3, random_state=42)
_, X_val, _, y2_val = train_test_split(X_train, y2_train, test_size=0.5, random_state=42)
# 建立口味模型
clf2 = MultinomialNB()
clf2.fit(X_train, y2_train)
y2_pred = clf2.predict(X_test)
print('口味准确率:', accuracy_score(y2_test, y2_pred))
print('混淆矩阵:\n', confusion_matrix(y2_test, y2_pred))
print('分类报告:\n', classification_report(y2_test, y2_pred))
# 划分训练集和测试集
X_train, X_test, y3_train, y3_test = train_test_split(X, y3, test_size=0.3, random_state=42)
_, X_val, _, y3_val = train_test_split(X_train, y3_train, test_size=0.5, random_state=42)
# 建立服务模型
clf3 = MultinomialNB()
clf3.fit(X_train, y3_train)
y3_pred = clf3.predict(X_test)
print('服务准确率:', accuracy_score(y3_test, y3_pred))
print('混淆矩阵:\n', confusion_matrix(y3_test, y3_pred))
print('分类报告:\n', classification_report(y3_test, y3_pred))
# 划分训练集和测试集
X_train, X_test, y4_train, y4_test = train_test_split(X, y4, test_size=0.3, random_state=42)
_, X_val, _, y4_val = train_test_split(X_train, y4_train, test_size=0.5, random_state=42)
# 建立环境模型
clf4 = MultinomialNB()
clf4.fit(X_train, y4_train)
y4_pred = clf4.predict(X_test)
print('环境准确率:', accuracy_score(y4_test, y4_pred))
print('混淆矩阵:\n', confusion_matrix(y4_test, y4_pred))
print('分类报告:\n', classification_report(y4_test, y4_pred))
# 划分训练集和测试集
X_train, X_test, y5_train, y5_test = train_test_split(X, df['评论'], test_size=0.3, random_state=42)
_, X_val, _, y5_val = train_test_split(X_train, y5_train, test_size=0.5, random_state=42)
# 建立评论模型
clf5 = MultinomialNB()
clf5.fit(X_train, y5_train)
y5_pred = clf5.predict(X_test)
print('评论准确率:', accuracy_score(y5_test, y5_pred))
print('混淆矩阵:\n', confusion_matrix(y5_test, y5_pred))
print('分类报告:\n', classification_report(y5_test, y5_pred))
```
说明:
1. 首先,使用pandas库读取数据集并进行分词,将分词结果保存到新的一列中。
2. 然后,使用CountVectorizer将分词结果转换为向量形式,进行特征工程。
3. 接着,使用train_test_split函数将数据集划分为训练集、验证集和测试集。
4. 最后,使用MultinomialNB建立朴素贝叶斯分类模型,并使用accuracy_score、confusion_matrix和classification_report评估模型的准确率、混淆矩阵和分类报告。分别对评价均分、口味、服务、环境和评论五个方面进行建模和评估。