sklearn的tf-idf
时间: 2023-12-16 13:24:35 浏览: 77
TF-IDF是一种常用的文本特征提取方法,它可以用于文本分类、聚类、信息检索等任务。TF-IDF的全称是Term Frequency-Inverse Document Frequency,即词频-逆文档频率。它的主要思想是:如果某个词或短语在一篇文章中出现的频率TF高,并且在其他文章中很少出现,那么认为这个词或短语具有很好的类别区分能力,适合用来分类。
在sklearn中,可以使用TfidfVectorizer类来实现TF-IDF特征提取。下面是一个简单的例子:
```python
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = [
'This is the first document.',
'This is the second second document.',
'And the third one.',
'Is this the first document?',
]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)
print(vectorizer.get_feature_names())
print(X.toarray())
```
输出结果如下:
```
['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']
[[0. 0.46979139 0.58028582 0.38408524 0. 0.
0.38408524 0. 0.38408524]
[0. 0.6876236 0. 0.28108867 0. 0.53864762
0.28108867 0. 0.28108867]
[0.5 0. 0. 0. 0.5 0.
0. 0.5 0. ]
[0. 0.46979139 0.58028582 0.38408524 0. 0.
0.38408524 0. 0.38408524]]
```
其中,get_feature_names()方法可以获取所有特征的名称,toarray()方法可以将稀疏矩阵转换为密集矩阵。
阅读全文
相关推荐
![-](https://img-home.csdnimg.cn/images/20241231045053.png)
![-](https://img-home.csdnimg.cn/images/20241231044955.png)
![-](https://img-home.csdnimg.cn/images/20241231045053.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![-](https://img-home.csdnimg.cn/images/20241226111658.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)