基于tensorflow实现中文情感分类的完整代码
时间: 2023-12-11 18:02:14 浏览: 26
以下是基于TensorFlow实现中文情感分类的示例代码,供参考:
```python
import tensorflow as tf
import numpy as np
import pandas as pd
import jieba
# 读取数据集
df = pd.read_csv('data.csv', encoding='utf-8')
# 分词
def seg(text):
return ' '.join(jieba.cut(text))
df['text'] = df['text'].apply(seg)
# 建立词汇表
vocab = set()
for text in df['text']:
vocab |= set(text.split())
vocab = list(vocab)
vocab_size = len(vocab)
# 将词汇转换为数字编码
word2idx = {w: i for i, w in enumerate(vocab)}
idx2word = {i: w for i, w in enumerate(vocab)}
def encode(text):
return [word2idx[word] for word in text.split()]
df['text'] = df['text'].apply(encode)
# 划分训练集和测试集
train_size = int(len(df) * 0.8)
train_df = df[:train_size]
test_df = df[train_size:]
# 构建模型
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, 32, input_length=50),
tf.keras.layers.Conv1D(32, 3, activation='relu'),
tf.keras.layers.GlobalMaxPooling1D(),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()
# 训练模型
x_train = tf.keras.preprocessing.sequence.pad_sequences(train_df['text'], maxlen=50)
y_train = train_df['label']
model.fit(x_train, y_train, epochs=10, batch_size=32)
# 评估模型
x_test = tf.keras.preprocessing.sequence.pad_sequences(test_df['text'], maxlen=50)
y_test = test_df['label']
loss, accuracy = model.evaluate(x_test, y_test)
print('Test accuracy:', accuracy)
# 应用模型
def predict_sentiment(text):
x = encode(seg(text))
x = tf.keras.preprocessing.sequence.pad_sequences([x], maxlen=50)
y = model.predict(x)[0][0]
return 'positive' if y > 0.5 else 'negative'
print(predict_sentiment('这个电影太棒了,值得一看!'))
print(predict_sentiment('这个电影太糟糕了,一点也不好看。'))
```
需要注意的是,上述代码仅作为示例演示,实际应用时还需要根据具体情况进行调整和优化。例如,可以选择不同的模型架构、调整超参数、使用预训练的词向量等,以提高模型的性能和效果。