jieba实现tfidf 的代码
时间: 2023-10-26 10:07:33 浏览: 122
好的,以下是jieba实现tfidf的代码:
```python
import jieba.analyse
from collections import Counter
# 读取文件
with open('test.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 使用jieba进行分词并计算TF值
jieba.analyse.set_stop_words('stopword.txt') # 停用词
tags = jieba.analyse.extract_tags(text, topK=200, withWeight=True, allowPOS=('n', 'ns', 'nt', 'nz', 'nl', 'vn', 'v'))
tf = dict()
for tag in tags:
tf[tag[0]] = tag[1]
# 计算IDF值
words = []
for tag in tags:
words.append(tag[0])
idfs = Counter(words)
idf = dict()
for word in idfs:
idf[word] = idfs[word] / len(tags)
# 计算TF-IDF值
tf_idf = dict()
for word, tf_value in tf.items():
tf_idf[word] = tf_value * idf[word]
# 输出前10个关键词及其TF-IDF值
for word, value in sorted(tf_idf.items(), key=lambda item: item[1], reverse=True)[:10]:
print(word, value)
```
注意:需要提前准备好待分析的文本和停用词表。
阅读全文