用python作Bert预训练模型实现项目做文本情感分析
时间: 2024-09-26 16:08:41 浏览: 35
基于 python利用bert预训练的中文模型进行文本分类 数据集中文情感分析语料
在Python中使用BERT(Bidirectional Encoder Representations from Transformers)进行预训练模型实现文本情感分析是一个常见的自然语言处理任务。BERT是一种强大的Transformer架构,它通过双向Transformer编码上下文信息,能够捕捉到词语之间的深层依赖。以下是基本步骤:
1. **安装库**:首先需要安装必要的库,如`transformers`、`torch`等,可以使用pip命令安装:
```
pip install transformers torch
```
2. **加载预训练模型**:从Hugging Face的仓库下载预训练的BERT模型,例如`bert-base-chinese`或`bert-base-uncased`:
```python
from transformers import BertTokenizer, BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-chinese')
tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')
```
3. **数据准备**:对文本进行分词,并创建输入数据集,通常包括输入序列(tokens)、标签(情感分类):
```python
def encode_text(text):
return tokenizer.encode_plus(text, add_special_tokens=True, return_tensors='pt')
# 创建样本数据列表
texts_and_labels = [("这是一部好电影", "positive"), ("我不喜欢这本书", "negative")]
inputs = [encode_text(text) for text, label in texts_and_labels]
input_ids, attention_mask, labels = zip(*inputs)
```
4. **模型训练**:将数据输入模型进行训练,设置优化器、学习率和损失函数:
```python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
optimizer = AdamW(model.parameters(), lr=1e-5)
loss_fn = nn.CrossEntropyLoss()
for epoch in range(num_epochs):
for batch in train_dataloader:
inputs = {key: val.to(device) for key, val in batch.items()}
outputs = model(**inputs, labels=labels)
loss = outputs.loss
loss.backward()
optimizer.step()
optimizer.zero_grad()
```
5. **预测与评估**:测试阶段,对新的文本进行情感分析预测:
```python
def predict_sentiment(text):
inputs = encode_text(text)
input_ids = inputs['input_ids'].unsqueeze(0).to(device)
with torch.no_grad():
logits = model(input_ids)[0]
_, prediction = torch.max(logits, dim=1)
return tokenizer.decode(prediction[0])
new_text = "这部电影真的让我惊艳"
sentiment = predict_sentiment(new_text)
```
阅读全文