编写一段python代码,实现无监督的文本聚类
时间: 2024-03-02 07:22:03 浏览: 116
Sure, 我可以为您编写该段代码。以下是一个使用K-Means聚类算法进行文本聚类的简单示例代码:
```python
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
# 定义示例文本数据
documents = ["I love coding in python",
"Python is a popular programming language",
"Machine learning is the future of AI",
"Natural language processing is a challenging topic in AI",
"Coding in Java is also fun",
"I am a big fan of AI and its applications"]
# 定义向量化器并将文本转换为向量矩阵
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(documents).todense()
# K-Means聚类算法
num_clusters = 2
model = KMeans(n_clusters=num_clusters, init='k-means++', max_iter=100)
model.fit(X)
# 将文本簇分配到聚类中心
clusters = model.labels_.tolist()
# 输出聚类结果
results = pd.DataFrame({'document': documents, 'cluster': clusters})
print(results)
# 可视化聚类结果
colors = {0: 'red', 1: 'blue', 2: 'green', 3: 'purple'}
plt.scatter(X[:, 0], X[:, 1], c=[colors[x] for x in clusters])
plt.show()
```
该代码使用TF-IDF向量化器将文本转换为向量矩阵,然后使用K-Means聚类算法将文本划分为预定义的聚类数。最后,我们将聚类结果输出到一个Pandas数据帧中,并使用散点图可视化聚类结果。
阅读全文