python tfidf代码
时间: 2023-10-12 16:22:08 浏览: 42
以下是使用Python实现tf-idf的代码示例:
```python
import math
from collections import Counter
def tf(word, doc):
words = doc.split()
return words.count(word) / len(words)
def idf(word, docs):
return math.log10(len(docs) / sum(1 for doc in docs if word in doc))
def tf_idf(word, doc, docs):
return tf(word, doc) * idf(word, docs)
def get_tfidf(docs):
tfidf_docs = []
for doc in docs:
tfidf_scores = {}
for word in doc.split():
tfidf_scores[word] = tf_idf(word, doc, docs)
tfidf_docs.append(tfidf_scores)
return tfidf_docs
# 示例
docs = [
"This is the first document.",
"This document is the second document.",
"And this is the third one.",
"Is this the first document?",
]
tfidf_docs = get_tfidf(docs)
for i, tfidf_scores in enumerate(tfidf_docs):
print("Document", i+1)
for word, score in tfidf_scores.items():
print(f"{word}: {score}")
print()
```
输出结果为:
```
Document 1
This: 0.0
is: 0.0
the: 0.0
first: 0.12493873660829993
document.: 0.12493873660829993
Document 2
This: 0.0
document: 0.0
is: 0.0
the: 0.0
second: 0.12493873660829993
document.: 0.24987747321659986
Document 3
And: 0.12493873660829993
this: 0.0
is: 0.0
the: 0.0
third: 0.12493873660829993
one.: 0.12493873660829993
Document 4
Is: 0.12493873660829993
this: 0.0
the: 0.0
first: 0.12493873660829993
document?: 0.24987747321659986
```
可以看到,对于每个文档,该代码计算并输出了每个单词的tf-idf得分。