huggingface transformers实战
时间: 2023-06-23 13:09:06 浏览: 264
Hugging Face Transformers 是一个基于 PyTorch 和 TensorFlow 的自然语言处理(NLP)库,它提供了用于训练、微调和使用最先进的预训练模型的工具和接口。以下是使用 Hugging Face Transformers 进行实战的一些示例。
1. 文本分类
文本分类是将文本分为不同的类别或标签的任务。在这个示例中,我们将使用 Hugging Face Transformers 中的 DistilBERT 模型来训练一个情感分析分类器,以将电影评论分为正面或负面。
```python
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
import torch
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
# 训练数据
train_texts = ["I really liked this movie", "The plot was boring and predictable"]
train_labels = [1, 0]
# 将文本编码为输入张量
train_encodings = tokenizer(train_texts, truncation=True, padding=True)
# 将标签编码为张量
train_labels = torch.tensor(train_labels)
# 训练模型
model.train()
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)
for epoch in range(3):
optimizer.zero_grad()
outputs = model(**train_encodings, labels=train_labels)
loss = outputs.loss
loss.backward()
optimizer.step()
# 预测新的评论
texts = ["This is a great movie", "I hated this movie"]
encodings = tokenizer(texts, truncation=True, padding=True)
model.eval()
with torch.no_grad():
outputs = model(**encodings)
predictions = torch.argmax(outputs.logits, dim=1)
print(predictions)
```
2. 问答系统
问答系统是回答用户提出的问题的模型。在这个示例中,我们将使用 Hugging Face Transformers 中的 DistilBERT 模型和 SQuAD 数据集来训练一个简单的问答系统。
```python
from transformers import DistilBertTokenizer, DistilBertForQuestionAnswering
import torch
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
model = DistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased')
# 加载 SQuAD 数据集
from transformers import squad_convert_examples_to_features, SquadExample, SquadFeatures, squad_processors
processor = squad_processors['squad']
examples = processor.get_train_examples('data')
features = squad_convert_examples_to_features(examples=examples, tokenizer=tokenizer, max_seq_length=384, doc_stride=128, max_query_length=64, is_training=True)
# 训练模型
model.train()
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)
for epoch in range(3):
for feature in features:
optimizer.zero_grad()
outputs = model(input_ids=torch.tensor([feature.input_ids]), attention_mask=torch.tensor([feature.attention_mask]), start_positions=torch.tensor([feature.start_position]), end_positions=torch.tensor([feature.end_position]))
loss = outputs.loss
loss.backward()
optimizer.step()
# 预测新的问题
text = "What is the capital of France?"
question = "What country's capital is Paris?"
inputs = tokenizer.encode_plus(question, text, add_special_tokens=True, return_tensors='pt')
model.eval()
with torch.no_grad():
start_scores, end_scores = model(**inputs)
start_index = torch.argmax(start_scores)
end_index = torch.argmax(end_scores)
answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs['input_ids'][0][start_index:end_index+1]))
print(answer)
```
3. 文本生成
文本生成是使用预训练模型生成自然语言文本的任务。在这个示例中,我们将使用 Hugging Face Transformers 中的 GPT-2 模型生成一些小说的开头。
```python
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
# 生成新的文本
seed_text = "In a hole in the ground there lived a hobbit."
encoded = tokenizer.encode(seed_text, return_tensors='pt')
model.eval()
with torch.no_grad():
output = model.generate(encoded, max_length=100, do_sample=True)
generated = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated)
```
这些示例只是 Hugging Face Transformers 库的一部分功能。您可以通过访问 Hugging Face Transformers 官方文档来了解更多信息。
阅读全文