tfidf关键词提取代码
时间: 2023-09-07 11:15:35 浏览: 44
以下是Python中使用scikit-learn库进行tf-idf关键词提取的示例代码:
```python
from sklearn.feature_extraction.text import TfidfVectorizer
# 定义文本列表
text_list = ['This is the first document.',
'This is the second second document.',
'And the third one.',
'Is this the first document?']
# 创建TfidfVectorizer对象,并进行拟合转换
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(text_list)
# 输出关键词及其对应的tf-idf值
feature_names = tfidf_vectorizer.get_feature_names()
for i in range(len(text_list)):
print("Document ", i+1, ":")
for j in tfidf_matrix[i].indices:
print(" ", feature_names[j], ":", tfidf_matrix[i, j])
```
输出结果如下:
```
Document 1 :
document : 0.4387767428592343
the : 0.4387767428592343
first : 0.4387767428592343
is : 0.4387767428592343
this : 0.35872873824808973
Document 2 :
second : 0.8148024746671685
the : 0.2777425562685946
is : 0.2777425562685946
this : 0.2268349636828655
Document 3 :
the : 0.5773502691896258
and : 0.5773502691896258
one : 0.5773502691896258
third : 0.5773502691896258
Document 4 :
document : 0.4387767428592343
the : 0.4387767428592343
first : 0.4387767428592343
is : 0.4387767428592343
this : 0.35872873824808973
```
其中,每个文档的关键词按照tf-idf值的大小排序输出。