首页用python写使用Word2Vec模型计算文本相似度

用python写使用Word2Vec模型计算文本相似度

时间: 2024-05-16 22:17:24 浏览: 167

首先，需要安装 `gensim` 库，它提供了 Word2Vec 模型的实现和相关的工具函数。接下来，我们需要将文本转换成词向量，可以使用 `gensim.models.Word2Vec` 类来训练一个 Word2Vec 模型，然后使用 `model.wv[word]` 方法将词转换成词向量。最后，可以使用余弦相似度计算两个文本的相似度。下面是一个示例代码： ```python from gensim.models import Word2Vec import numpy as np # 训练 Word2Vec 模型 sentences = [["this", "is", "a", "sentence"], ["this", "is", "another", "sentence"]] model = Word2Vec(sentences, size=100, window=5, min_count=1, workers=4) # 计算文本相似度 def text_similarity(text1, text2): # 将文本转换成词向量 vector1 = np.mean([model.wv[word] for word in text1], axis=0) vector2 = np.mean([model.wv[word] for word in text2], axis=0) # 计算余弦相似度 similarity = np.dot(vector1, vector2) / (np.linalg.norm(vector1) * np.linalg.norm(vector2)) return similarity # 测试 text1 = ["this", "is", "a", "sentence"] text2 = ["this", "is", "another", "sentence"] similarity = text_similarity(text1, text2) print("文本相似度：", similarity) ``` 在这个示例中，我们使用两个句子训练了一个 Word2Vec 模型，然后计算了这两个句子的相似度。你可以将 `text1` 和 `text2` 替换成你要比较的文本即可。

阅读全文