写一份基于pytorch的情感分类模型
时间: 2023-06-30 20:12:56 浏览: 107
基于CNN_Pytorch的文字情感分类
5星 · 资源好评率100%
好的,以下是一个基于PyTorch的情感分类模型示例:
1. 数据预处理
在训练模型前,需要对数据进行预处理,例如将文本转换为数字表示、去除停用词、词干提取等。这里我们使用torchtext库来进行数据预处理。
```python
import torchtext
from torchtext.datasets import text_classification
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator
# 定义tokenizer
tokenizer = get_tokenizer('basic_english')
# 定义文本处理函数
def text_transform(text):
tokens = tokenizer(text)
return tokens
# 定义标签处理函数
def label_transform(label):
return int(label) - 1
# 定义数据集和字段
train_dataset, test_dataset = text_classification.DATASETS['AG_NEWS'](
root='./data', ngrams=1, vocab=None, include_unk=False)
# 构建词汇表
vocab = build_vocab_from_iterator(map(text_transform, train_dataset), specials=["<unk>"])
vocab.set_default_index(vocab["<unk>"])
# 定义数据处理管道
text_pipeline = lambda x: vocab(text_transform(x))
label_pipeline = lambda x: label_transform(x)
```
2. 定义模型
我们使用一个简单的卷积神经网络(CNN)来进行情感分类。CNN在自然语言处理中表现良好,尤其是在文本分类任务中。
```python
import torch.nn as nn
import torch.nn.functional as F
# 定义CNN模型
class TextCNN(nn.Module):
def __init__(self, vocab_size, embed_dim, num_classes, num_filters, filter_sizes):
super(TextCNN, self).__init__()
self.embedding = nn.EmbeddingBag(vocab_size, embed_dim)
self.convs = nn.ModuleList([
nn.Conv2d(1, num_filters, (fs, embed_dim)) for fs in filter_sizes
])
self.fc = nn.Linear(num_filters * len(filter_sizes), num_classes)
def forward(self, text):
embedded = self.embedding(text)
embedded = embedded.unsqueeze(1)
conved = [F.relu(conv(embedded)).squeeze(3) for conv in self.convs]
pooled = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in conved]
cat = torch.cat(pooled, dim=1)
return self.fc(cat)
```
3. 训练模型
定义模型后,我们需要对模型进行训练。这里我们使用交叉熵损失函数和随机梯度下降(SGD)优化器进行训练。
```python
import torch.optim as optim
# 定义超参数
BATCH_SIZE = 64
EMBED_DIM = 32
NUM_FILTERS = 100
FILTER_SIZES = [3, 4, 5]
NUM_CLASSES = len(train_dataset.get_labels())
# 定义模型和优化器
model = TextCNN(len(vocab), EMBED_DIM, NUM_CLASSES, NUM_FILTERS, FILTER_SIZES)
optimizer = optim.SGD(model.parameters(), lr=4.0)
# 定义训练函数
def train(train_data):
model.train()
train_loss = 0
train_acc = 0
for text, label in train_data:
optimizer.zero_grad()
text = text.to(device)
label = label.to(device)
output = model(text)
loss = F.cross_entropy(output, label)
loss.backward()
optimizer.step()
train_loss += loss.item()
train_acc += (output.argmax(1) == label).sum().item()
return train_loss / len(train_data), train_acc / len(train_data.dataset)
# 定义测试函数
def test(test_data):
model.eval()
test_loss = 0
test_acc = 0
with torch.no_grad():
for text, label in test_data:
text = text.to(device)
label = label.to(device)
output = model(text)
loss = F.cross_entropy(output, label)
test_loss += loss.item()
test_acc += (output.argmax(1) == label).sum().item()
return test_loss / len(test_data), test_acc / len(test_data.dataset)
# 开始训练
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
for epoch in range(10):
train_loss, train_acc = train(train_dataset)
test_loss, test_acc = test(test_dataset)
print(f'Epoch {epoch+1}, Train Loss: {train_loss:.3f}, Train Acc: {train_acc:.3f}, Test Loss: {test_loss:.3f}, Test Acc: {test_acc:.3f}')
```
4. 使用模型进行预测
训练完成后,我们可以使用模型对新的文本进行情感分类。
```python
def predict(text):
model.eval()
with torch.no_grad():
text = torch.tensor(text_pipeline(text)).unsqueeze(0).to(device)
output = model(text)
pred = output.argmax(1).item() + 1
return pred
text = "I love this product!"
predict(text) # 输出1,表示正向情感
```
以上就是一个简单的基于PyTorch的情感分类模型示例。当然,具体模型的性能还需要根据实际需求进行调整和优化。
阅读全文