Jupyter电商产品评论情感分析代码
时间: 2023-07-09 07:47:08 浏览: 86
以下是一个基于Python的Jupyter Notebook电商产品评论情感分析代码,使用了自然语言处理库NLTK和机器学习库Scikit-learn:
首先,我们需要导入所需的库:
```python
import pandas as pd
import numpy as np
import re
import nltk
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report
```
接着,我们需要读取数据集,并进行一些简单的数据清洗:
```python
# 读取数据
df = pd.read_csv('reviews.csv')
# 数据清洗
df.dropna(inplace=True)
df.reset_index(drop=True,inplace=True)
df['rating'] = np.where(df['rating'] > 3, 1, 0)
```
然后,我们需要对评论进行分词,并去除停用词:
```python
# 分词
def tokenize(text):
tokens = nltk.word_tokenize(text)
stems = []
for item in tokens:
stems.append(nltk.PorterStemmer().stem(item))
return stems
# 去除停用词
stop_words = set(stopwords.words('english'))
stop_words.add('br')
cv = CountVectorizer(tokenizer=tokenize, stop_words=stop_words)
```
接着,我们需要计算TF-IDF并将数据集分为训练集和测试集:
```python
# 计算TF-IDF
tfidf_transformer = TfidfTransformer()
# 分割数据集
X_train, X_test, y_train, y_test = train_test_split(df['review'], df['rating'], random_state=0)
X_train_counts = cv.fit_transform(X_train)
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
```
然后,我们需要使用朴素贝叶斯进行分类:
```python
# 训练模型
clf = MultinomialNB().fit(X_train_tfidf, y_train)
# 预测测试集
X_test_counts = cv.transform(X_test)
X_test_tfidf = tfidf_transformer.transform(X_test_counts)
y_pred = clf.predict(X_test_tfidf)
```
最后,我们需要评估模型的性能并输出分类报告:
```python
# 输出分类报告
print(classification_report(y_test, y_pred))
```
以上就是一个简单的基于Jupyter Notebook的电商产品评论情感分析代码。