用评价数据集 WordSim353 对词向量进行评价的示例代码
时间: 2024-02-13 14:00:06 浏览: 122
wordsim353.rar
5星 · 资源好评率100%
以下是一个使用Python和Gensim库对WordSim353数据集和GloVe词向量进行评价的示例代码:
```
from gensim.models import KeyedVectors
from scipy.stats import spearmanr
# 读入GloVe词向量
glove_model = KeyedVectors.load_word2vec_format("glove.6B.300d.txt", binary=False)
# 读入WordSim353数据集
with open("wordsim353.csv") as f:
lines = f.readlines()[1:]
word_pairs = []
similarity_scores = []
for line in lines:
word1, word2, score = line.strip().split(",")
word_pairs.append((word1, word2))
similarity_scores.append(float(score))
# 计算词向量余弦相似度
cosine_similarities = []
for word1, word2 in word_pairs:
if word1 in glove_model.vocab and word2 in glove_model.vocab:
vector1 = glove_model[word1]
vector2 = glove_model[word2]
cosine_similarities.append(vector1.dot(vector2)/(vector1.norm()*vector2.norm()))
else:
cosine_similarities.append(0)
# 计算Spearman相关系数
correlation, pvalue = spearmanr(similarity_scores, cosine_similarities)
print("Spearman correlation:", correlation)
```
其中,我们使用了Gensim库来读入GloVe词向量,使用了Scipy库中的spearmanr函数来计算Spearman相关系数。
阅读全文