如何使用GPT2对文本分类任务进行微调,请用Python代码
时间: 2023-07-05 09:12:38 浏览: 331
好的,以下是使用GPT-2对文本分类任务进行微调的Python代码:
```python
import torch
from transformers import GPT2Tokenizer, GPT2ForSequenceClassification, AdamW
# 加载预训练的GPT-2模型和tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2ForSequenceClassification.from_pretrained('gpt2')
# 加载训练数据和标签
train_texts = ['This is a positive sentence.', 'This is a negative sentence.']
train_labels = [1, 0]
# 对训练数据进行tokenize和padding
train_encodings = tokenizer(train_texts, truncation=True, padding=True)
# 将训练数据加载到PyTorch的DataLoader中
train_dataset = torch.utils.data.TensorDataset(torch.tensor(train_encodings['input_ids']),
torch.tensor(train_encodings['attention_mask']),
torch.tensor(train_labels))
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=2, shuffle=True)
# 定义优化器和损失函数
optimizer = AdamW(model.parameters(), lr=5e-5)
loss_fn = torch.nn.CrossEntropyLoss()
# 进行微调训练
for epoch in range(3):
for batch in train_loader:
optimizer.zero_grad()
input_ids, attention_mask, labels = batch
outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
loss = outputs[0]
loss.backward()
optimizer.step()
print('Epoch:', epoch+1, 'Loss:', loss.item())
# 保存微调后的模型
model.save_pretrained('gpt2_classifier')
tokenizer.save_pretrained('gpt2_classifier')
```
这段代码使用了`GPT2ForSequenceClassification`模型对两个句子进行分类,其中`train_texts`存储训练数据,`train_labels`存储标签。首先,将训练数据进行tokenize和padding,并将其加载到PyTorch的DataLoader中。然后,定义优化器和损失函数,并使用微调训练对模型进行微调。最后,保存微调后的模型和tokenizer。
阅读全文