请给我一段使用Python的用GPT2实现文本分类并检验模型准确率的非常详细的代码
时间: 2024-02-13 22:02:25 浏览: 61
当然,以下是一个使用Python和GPT2实现文本分类并检验模型准确率的代码示例:
首先,我们需要安装所需的包:
```
!pip install transformers
!pip install torch
!pip install scikit-learn
```
接下来,我们将导入所需的库和模型:
```python
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
from sklearn.metrics import accuracy_score
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2', output_hidden_states=True)
```
然后,我们将加载我们的数据集并进行预处理:
```python
# Load dataset
dataset = [
{'text': 'This is a positive review', 'label': 1},
{'text': 'This is a negative review', 'label': 0},
{'text': 'I loved this product', 'label': 1},
{'text': 'I hated this product', 'label': 0},
{'text': 'This is a neutral review', 'label': 2},
]
# Create input tensors
inputs = []
for data in dataset:
inputs.append(tokenizer.encode(data['text'], add_special_tokens=True))
# Pad input tensors
max_length = max([len(i) for i in inputs])
for i in range(len(inputs)):
inputs[i] = inputs[i] + [0]*(max_length-len(inputs[i]))
inputs = torch.tensor(inputs)
labels = torch.tensor([data['label'] for data in dataset])
```
接下来,我们将分割数据集为训练集和测试集:
```python
# Split train and test datasets
train_inputs, test_inputs = inputs[:3], inputs[3:]
train_labels, test_labels = labels[:3], labels[3:]
```
然后,我们将训练我们的模型:
```python
# Train model
optimizer = torch.optim.Adam(model.parameters(), lr=5e-5)
loss_fn = torch.nn.CrossEntropyLoss()
for epoch in range(5):
optimizer.zero_grad()
outputs = model(train_inputs, labels=train_inputs)
loss = loss_fn(outputs['logits'].view(-1, outputs['logits'].size(-1)), train_inputs.view(-1))
loss.backward()
optimizer.step()
```
接着,我们将在测试集上测试模型并计算准确率:
```python
# Test model
test_outputs = model(test_inputs)[0]
test_logits = test_outputs.squeeze()
test_predictions = torch.argmax(test_logits, dim=1).tolist()
# Compute accuracy
accuracy = accuracy_score(test_labels.tolist(), test_predictions)
print("Accuracy: {:.2f}%".format(accuracy*100))
```
这个代码示例展示了如何使用GPT2模型来进行文本分类并检验模型准确率。我们首先加载模型和数据集,然后进行预处理。接着,我们将数据集分割为训练集和测试集,并使用CrossEntropyLoss作为损失函数进行训练。最后,在测试集上测试模型并计算准确率。
阅读全文