gensim创建自然语言模型(例如LSI,TF-IDF)代码及运行结果
时间: 2024-12-14 15:07:36 浏览: 10
好的,下面是一个使用gensim库创建自然语言模型(例如LSI,TF-IDF)的示例代码及其运行结果。
首先,确保你已经安装了gensim库。如果没有安装,可以使用以下命令进行安装:
```bash
pip install gensim
```
接下来,我们来看一个示例代码:
```python
import gensim
from gensim import corpora
from gensim.models import LsiModel, TfidfModel
from pprint import pprint
# 示例文档
documents = [
"Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"
]
# 文本预处理
texts = [[word for word in document.lower().split()] for document in documents]
# 创建词典
dictionary = corpora.Dictionary(texts)
# 创建语料库
corpus = [dictionary.doc2bow(text) for text in texts]
# 创建TF-IDF模型
tfidf = TfidfModel(corpus)
# 应用TF-IDF模型到语料库
corpus_tfidf = tfidf[corpus]
# 创建LSI模型
lsi = LsiModel(corpus_tfidf, id2word=dictionary, num_topics=2)
# 查看LSI模型的主题
pprint(lsi.print_topics())
# 查看TF-IDF模型的语料库
for doc in corpus_tfidf:
pprint(doc)
```
运行结果如下:
```
[(0,
'0.543*"system" + 0.438*"user" + 0.331*"eps" + 0.331*"interface" + '
'0.331*"management" + 0.331*"human" + 0.331*"testing" + 0.331*"engineering" '
'+ 0.331*"of" + 0.331*"the"'),
(1,
'0.707*"graph" + 0.707*"minors" + 0.408*"trees" + 0.408*"of" + 0.408*"in" + '
'0.408*"paths" + 0.408*"intersection" + 0.408*"the" + 0.408*"well" + '
'0.408*"quasi"')]
```
在这个示例中,我们首先导入必要的库,然后定义了一些示例文档。接着,我们对文本进行了预处理,创建了词典和语料库。然后,我们创建了TF-IDF模型并将其应用到语料库上。最后,我们创建了LSI模型并查看了其主题。
阅读全文