glove预训练词向量下载,保存为.vec文件
时间: 2024-06-08 09:09:06 浏览: 166
你可以在以下网站下载GloVe预训练词向量:
1. Stanford NLP Group:https://nlp.stanford.edu/projects/glove/
2. GitHub:https://github.com/stanfordnlp/GloVe
选择下载你需要的词向量,比如说GloVe.6B,它包含了600000个词汇的向量,每个向量的维度为300。下载完成后,将解压后的文件保存为.vec格式即可。以下是一个示例代码:
``` python
import urllib.request
# 下载GloVe预训练词向量
url = 'http://nlp.stanford.edu/data/glove.6B.zip'
urllib.request.urlretrieve(url, 'glove.6B.zip')
# 解压文件
import zipfile
with zipfile.ZipFile('glove.6B.zip', 'r') as zip_ref:
zip_ref.extractall('glove.6B')
# 将词向量保存为.vec文件
import os
import numpy as np
def save_word_vectors_to_file(file_path, word_vectors):
with open(file_path, 'w', encoding='utf-8') as f_out:
for word, vector in word_vectors.items():
vector_str = ' '.join([str(val) for val in vector])
f_out.write(f"{word} {vector_str}\n")
def load_word_vectors_from_file(file_path):
word_vectors = {}
with open(file_path, 'r', encoding='utf-8') as f_in:
for line in f_in:
parts = line.strip().split()
word = parts[0]
vector = np.array([float(val) for val in parts[1:]])
word_vectors[word] = vector
return word_vectors
glove_file = 'glove.6B/glove.6B.300d.txt'
word_vectors = load_word_vectors_from_file(glove_file)
save_word_vectors_to_file('glove.6B/glove.6B.300d.vec', word_vectors)
```
这个代码会从Stanford NLP Group的网站下载GloVe.6B预训练词向量,解压文件,然后保存为.vec格式。你可以更改文件路径和下载的词向量类型来适应你的需要。
阅读全文