首页bert将一个txt文件变词向量后进行传统文本分类的代码

bert将一个txt文件变词向量后进行传统文本分类的代码

时间: 2024-02-01 18:15:55 浏览: 71

以下是使用BERT模型进行传统文本分类的Python代码示例： ```python import torch from transformers import BertTokenizer, BertForSequenceClassification # 加载预训练的BERT模型 model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2) tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') # 加载文本数据 with open('text_file.txt', 'r') as f: texts = f.readlines() # 将文本转换为词向量 input_ids = [] attention_masks = [] for text in texts: encoded_dict = tokenizer.encode_plus( text, add_special_tokens=True, max_length=128, pad_to_max_length=True, return_attention_mask=True, return_tensors='pt' ) input_ids.append(encoded_dict['input_ids']) attention_masks.append(encoded_dict['attention_mask']) input_ids = torch.cat(input_ids, dim=0) attention_masks = torch.cat(attention_masks, dim=0) # 运行BERT模型进行文本分类 with torch.no_grad(): outputs = model(input_ids, attention_mask=attention_masks) logits = outputs[0] # 输出分类结果 predictions = torch.argmax(logits, dim=-1) print(predictions) ``` 在以上代码中，我们首先加载了预训练的BERT模型和对应的Tokenizer，然后加载了待分类的文本数据。接下来，使用Tokenizer将文本转换为词向量，并将所有文本的词向量拼接在一起作为输入。最后，我们运行BERT模型，得到分类结果。

阅读全文