使用transformers包给我一个文本分类的模型
时间: 2024-03-02 09:53:42 浏览: 93
好的,以下是使用 transformers 包构建文本分类模型的示例代码:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
import torch
# 加载预训练模型tokenizer和模型
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# 定义文本分类任务的数据集
train_texts = ["This is a positive example.", "This is a negative example."]
train_labels = [1, 0]
train_encodings = tokenizer(train_texts, truncation=True, padding=True)
class TextClassificationDataset(torch.utils.data.Dataset):
def __init__(self, encodings, labels):
self.encodings = encodings
self.labels = labels
def __getitem__(self, idx):
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
item['labels'] = torch.tensor(self.labels[idx])
return item
def __len__(self):
return len(self.labels)
train_dataset = TextClassificationDataset(train_encodings, train_labels)
# 定义训练参数
training_args = TrainingArguments(
output_dir='./results',
num_train_epochs=1,
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
warmup_steps=500,
weight_decay=0.01,
logging_dir='./logs',
logging_steps=10,
)
# 训练模型
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
)
trainer.train()
```
这个示例代码使用了 DistilBERT 模型,可以根据实际情况选择不同的预训练模型。训练数据集中包含两个样本,分别为正例和反例。你需要根据自己的数据集修改训练数据集的内容。训练参数可以根据实际情况进行调整,比如训练轮数、批次大小等。
阅读全文