句向量K-means聚类的代码
时间: 2023-10-31 09:07:46 浏览: 170
以下是一个简单的句向量K-means聚类的代码示例,使用gensim库和sklearn库实现:
```python
from gensim.models import Word2Vec
from sklearn.cluster import KMeans
import numpy as np
# 加载训练好的Word2Vec模型
model = Word2Vec.load('model.bin')
# 定义句子列表
sentences = [
'I love machine learning',
'I hate my job',
'I want to travel the world',
'I need a break'
]
# 将句子向量化
sentence_vectors = []
for sentence in sentences:
sentence_vector = np.zeros((100,))
n_words = 0
for word in sentence.split():
if word in model:
sentence_vector += model[word]
n_words += 1
if n_words > 0:
sentence_vector /= n_words
sentence_vectors.append(sentence_vector)
# 使用K-means聚类
kmeans = KMeans(n_clusters=2, random_state=0).fit(sentence_vectors)
cluster_labels = kmeans.labels_
# 打印聚类结果
for i, sentence in enumerate(sentences):
print(f"\"{sentence}\" belongs to cluster {cluster_labels[i]}")
```
在上面的示例中,我们首先加载了一个训练好的Word2Vec模型,并定义了一个包含几个句子的列表。然后,我们将每个句子向量化,并使用sklearn库中的K-means算法将它们分成两个集群。最后,我们打印出了每个句子所属的集群编号。
阅读全文