首页如何用python语言实现基于TF-IDF的关键词提取

如何用python语言实现基于TF-IDF的关键词提取

时间: 2024-05-09 08:17:33 浏览: 55

可以使用Python中的scikit-learn库来实现基于TF-IDF的关键词提取。以下是示例代码： ``` from sklearn.feature_extraction.text import TfidfVectorizer corpus = ['This is the first document.','This is the second second document.','And the third one.','Is this the first document?.'] vectorizer = TfidfVectorizer() X = vectorizer.fit_transform(corpus) keywords = vectorizer.get_feature_names() for idx, doc in enumerate(corpus): scores = [(keywords[i], X[idx, i]) for i in range(X[idx,:].shape[1])] sorted_scores = sorted(scores, key=lambda x: x[1], reverse=True) print("Keywords for document {}: {}".format(idx+1, [x[0] for x in sorted_scores[:3]])) ``` 这段代码可以处理一个包含多篇文档的列表。首先，我们实例化一个TfidfVectorizer对象并使用它对我们的文本进行拟合（使用fit_transform()方法）。然后，我们可以使用get_feature_names()方法来获取所有特征的名称，即关键词。最后，我们可以为每篇文档的每个关键词计算得分，并将它们按得分排序，以找出每篇文档的前三个关键词。

阅读全文