具体代码,尤其是针对评论的
时间: 2024-03-10 17:46:32 浏览: 46
以下是一个简单的基于Jieba和深度学习的中文文本情感分析代码,可以用于对评论进行情感分析:
```python
import jieba
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Embedding, LSTM, Conv1D, MaxPooling1D
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
# 加载数据
with open('comments.txt', 'r', encoding='utf-8') as f:
comments = f.readlines()
# 对评论进行分词和去除停用词等预处理
stopwords = ['的', '了', '是', '我', '你', '他', '她', '它', '我们', '你们', '他们', '她们', '它们']
def preprocess(comment):
words = jieba.lcut(comment)
words = [w for w in words if w not in stopwords]
return ' '.join(words)
comments = [preprocess(c) for c in comments]
# 构建词向量和标签
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(comments)
sequences = tokenizer.texts_to_sequences(comments)
X = pad_sequences(sequences, maxlen=100)
y = np.loadtxt('labels.txt')
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# 构建CNN-LSTM模型
model = Sequential()
model.add(Embedding(10000, 100, input_length=100))
model.add(Conv1D(filters=64, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# 训练模型
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=128)
# 对新的评论进行情感分析
new_comment = '这个产品真的很好用,非常满意!'
new_comment = preprocess(new_comment)
new_comment_sequence = tokenizer.texts_to_sequences([new_comment])
new_comment_pad = pad_sequences(new_comment_sequence, maxlen=100)
prediction = model.predict(new_comment_pad)
if prediction > 0.5:
print('积极')
else:
print('消极')
```
需要注意的是,这里的数据集和标签需要自己准备,数据集文件`comments.txt`中存储了多条评论,标签文件`labels.txt`中存储了每条评论的情感标签(0为消极,1为积极)。另外,模型的具体结构和参数可以根据实际情况进行调整和优化。
阅读全文