用python写一个tfidf
时间: 2024-05-09 20:19:01 浏览: 12
作为AI助手,我可以提供一份 Python 代码来实现基本的 TF-IDF 算法:
```python
import math
from collections import Counter
def tf(word, doc):
words = doc.split()
return words.count(word) / len(words)
def idf(word, docs):
count = 0
for doc in docs:
if word in doc:
count += 1
return math.log(len(docs) / count)
def tfidf(word, doc, docs):
return tf(word, doc) * idf(word, docs)
# 示例文档集合
docs = [
"Python is an interpreted high-level programming language for general-purpose programming.",
"Created by Guido van Rossum and first released in 1991, Python has a design philosophy that emphasizes code readability, notably using significant whitespace.",
"It provides constructs that enable clear programming on both small and large scales.",
"Python features a dynamic type system and automatic memory management.",
"It supports multiple programming paradigms, including object-oriented, imperative, functional and procedural, and has a large and comprehensive standard library."
]
# 计算 TF-IDF
for i, doc in enumerate(docs):
words = doc.split()
scores = {word: tfidf(word, doc, docs) for word in words}
sorted_words = sorted(scores.items(), key=lambda x: x[1], reverse=True)
print(f"Top words in document {i+1}: ", end="")
for word, score in sorted_words[:3]:
print(f"{word} ({score:.2f})", end=" ")
print("")
```
这段代码计算了示例文档集合中每篇文档中的 TF-IDF 值,并输出每篇文档中 TF-IDF 值最高的前三个词汇。如有需要,您可以根据实际情况进行修改。