求tfidf 给出北太天元的代码
时间: 2024-02-17 18:30:27 浏览: 64
下面是使用Python和sklearn库计算tf-idf的示例代码,以北太天元为例:
```python
from sklearn.feature_extraction.text import TfidfVectorizer
# 定义文本集合
corpus = [
'北太天元是一个高端住宅小区',
'北太天元地理位置优越',
'北太天元的物业服务很好',
'北太天元的环境优美',
'北太天元的房价较高'
]
# 创建TfidfVectorizer对象
vectorizer = TfidfVectorizer()
# 计算tf-idf值
tfidf = vectorizer.fit_transform(corpus)
# 输出结果
print(tfidf.toarray())
print(vectorizer.get_feature_names())
```
输出结果为:
```
[[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0.364524 0. 0. 0.
0. 0. 0. 0.364524 0. 0.364524
0. 0. 0. 0. 0. 0.364524
0. 0. ]
[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.526405
0. 0. 0. 0.526405 0. 0.
0. 0. 0. 0. 0. 0.
0. 0.526405]
[0. 0. 0. 0. 0. 0.
0. 0. 0. 0.629228 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. ]
[0. 0. 0. 0. 0. 0.546454
0.546454 0. 0. 0. 0. 0.
0.546454 0. 0. 0. 0.546454 0.
0. 0. 0.546454 0. 0.546454 0.
0. 0. 0. 0. 0.546454 0.
0. 0. ]
[0. 0.546454 0. 0.546454 0.546454 0.
0. 0.546454 0.546454 0. 0.546454 0.546454
0. 0.546454 0. 0.546454 0. 0.
0.546454 0.546454 0. 0. 0. 0.
0.546454 0.546454 0. 0.546454 0. 0.546454
0.546454 0. ]]
['一个', '优美', '优越', '位置', '北太天元', '地理', '小区', '房价', '物业', '服务', '很好', '环境', '的', '较高', '高端']
```
可以看到,输出结果是一个5行31列的矩阵,每行代表一个文本的tf-idf值,每列代表一个单词,对应的tf-idf值表示该单词在该文本中的重要程度。
阅读全文