python LDA代码
时间: 2023-02-20 18:17:25 浏览: 89
LDA(Latent Dirichlet Allocation)是一种常用的主题模型,可以用来对文本进行分析和聚类。
下面是一份简单的 Python 代码,使用 Gensim 库实现 LDA:
```
import gensim
from gensim import corpora
# 加载文本数据
documents = ["Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"]
# 将文本数据分词并去除停用词
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
for document in documents]
# 创建词典
dictionary = corpora.Dictionary(texts)
# 将文本转换为词袋表示法
corpus = [dictionary.doc2bow(text) for text in texts]
# 训练 LDA 模型
lda_model = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word=dictionary, passes=50)
# 打印主题
print(lda_model.print_topics())
```
这份代码首先加载了文本数据,然后将文本数据分词并去除了停用词,接着创建了词典,将文本转换为词袋表示法,最后训练了 LDA 模型并打印了主题。