如果我想将这个代码封装成一个方法来执行呢?
时间: 2024-01-16 10:02:55 浏览: 31
好的,你可以使用以下代码将上述功能封装成一个方法:
```python
import jieba
import gensim
def get_most_similar_sentences(sentences, model_path):
# 加载预训练的Word2Vec模型
model = gensim.models.Word2Vec.load(model_path)
# 分词并去除停用词
stopwords = ['的', '了', '是', '我', '你', '他', '她', '它', '有', '在', '和', '就', '这', '也']
sentences = [[word for word in jieba.cut(sentence) if word not in stopwords] for sentence in sentences]
# 将每个句子转化为词向量的平均值
sentence_vectors = []
for sentence in sentences:
vector = []
for word in sentence:
if word in model.wv.vocab:
vector.append(model.wv[word])
if vector:
sentence_vectors.append(sum(vector) / len(vector))
# 计算每个句子与其他句子的相似度
similarities = []
for i, vector in enumerate(sentence_vectors):
similarity = []
for j, v in enumerate(sentence_vectors):
if i == j:
similarity.append(0)
else:
similarity.append(model.wv.cosine_similarities(vector, [v])[0])
similarities.append(similarity)
# 找出与每个句子最相似的前三个句子
results = []
for i, similarity in enumerate(similarities):
similar_sentences = []
for index in sorted(range(len(similarity)), key=lambda i: similarity[i], reverse=True)[1:4]:
similar_sentences.append(sentences[index])
results.append(similar_sentences)
return results
```
这个方法接收两个参数,分别是要查找相似句子的列表和预训练的Word2Vec模型的路径。方法的返回值是一个列表,其中每个元素表示与原列表中对应句子最相似的前三个句子。你可以通过调用这个方法来实现你的需求,例如:
```python
sentences = ['第一句话', '第二句话', '第三句话', '第四句话', '第五句话']
model_path = 'your_model_path'
results = get_most_similar_sentences(sentences, model_path)
for result in results:
print(result)
```
其中,sentences和model_path需要根据实际情况进行修改。执行这段代码后,会输出与每个句子最相似的前三个句子。