pytorch文本分类案例
时间: 2023-07-17 07:04:40 浏览: 93
当然!我可以为您提供一个基于PyTorch的简单文本分类案例。首先,我们需要安装PyTorch和torchtext库。您可以使用以下命令进行安装:
```python
pip install torch
pip install torchtext
```
接下来,我们将使用IMDB电影评论数据集进行文本分类。首先,我们需要从torchtext中导入所需的模块:
```python
import torch
import torch.nn as nn
import torch.optim as optim
from torchtext.datasets import IMDB
from torchtext.data import Field, LabelField, BucketIterator
```
然后,我们定义预处理文本的函数:
```python
def preprocess_text(text):
text = text.lower() # 将文本转换为小写
text = text.replace("<br />", " ") # 删除HTML标签
return text
```
接下来,我们定义字段(Field)和标签(LabelField):
```python
text_field = Field(sequential=True, tokenize='spacy', preprocessing=preprocess_text)
label_field = LabelField(dtype=torch.float)
```
然后,我们加载IMDB数据集:
```python
train_data, test_data = IMDB.splits(text_field, label_field)
```
接下来,我们构建词汇表并加载预训练的词向量:
```python
text_field.build_vocab(train_data, vectors='glove.6B.100d')
label_field.build_vocab(train_data)
```
接下来,我们定义模型:
```python
class TextClassifier(nn.Module):
def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
super(TextClassifier, self).__init__()
self.embedding = nn.Embedding(vocab_size, embedding_dim)
self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers=2, bidirectional=True)
self.fc = nn.Linear(hidden_dim * 2, output_dim)
def forward(self, text):
embedded = self.embedding(text)
output, (hidden, _) = self.lstm(embedded)
hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)
return self.fc(hidden)
```
接下来,我们实例化模型并定义超参数:
```python
vocab_size = len(text_field.vocab)
embedding_dim = 100
hidden_dim = 256
output_dim = 1
model = TextClassifier(vocab_size, embedding_dim, hidden_dim, output_dim)
```
然后,我们定义损失函数和优化器:
```python
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters())
```
接下来,我们创建迭代器:
```python
train_iterator, test_iterator = BucketIterator.splits(
(train_data, test_data),
batch_size=64,
sort_key=lambda x: len(x.text),
device=torch.device('cuda' if torch.cuda.is_available() else 'cpu'))
```
最后,我们训练模型:
```python
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
criterion = criterion.to(device)
num_epochs = 10
for epoch in range(num_epochs):
for batch in train_iterator:
optimizer.zero_grad()
text = batch.text.to(device)
label = batch.label.to(device)
output = model(text).squeeze(1)
loss = criterion(output, label)
loss.backward()
optimizer.step()
```
这是一个简单的基于PyTorch的文本分类案例。您可以根据需要对模型进行修改和优化。希望对您有所帮助!
阅读全文