热度权重文件weight.txt的内容形式是什么样的,请举个例子,进行完整的python代码演示
时间: 2024-03-10 16:49:21 浏览: 18
假设weight.txt文件内容如下:
```
0.5
0.8
1.2
```
则完整的python代码如下:
```
import math
from collections import defaultdict
corpus =["二价 二价 二价 四价 预约", "四价 四价 四价 九价 预约", "九价 九价 九价 九价 预约"]
words = []
for sentence in corpus:
words.append(sentence.strip().split())
# 进行词频统计
def Counter(words):
word_count = []
for sentence in words:
word_dict = defaultdict(int)
for word in sentence:
word_dict[word] += 1
word_count.append(word_dict)
return word_count
word_count = Counter(words)
# 读取热度权重文件
with open('weight.txt', 'r') as f:
weight = [float(line.strip()) for line in f.readlines()]
# 计算IDF
def idf(word, word_count, weight):
count = count_sentence(word, word_count)
if count == 0:
return 0
else:
return math.log(sum(weight) / (weighted_count_sentence(word, word_count, weight) + 1), 10)
# 统计包含该单词的文档数
def count_sentence(word, word_count):
return sum([1 for i in word_count if i.get(word)])
# 统计包含该单词的文档的热度权重之和
def weighted_count_sentence(word, word_count, weight):
count = 0
for i in word_count:
if i.get(word):
count += weight[word_count.index(i)]
return count
# 计算TF-IDF
def tfidf(word, word_dict, word_count, weight):
return tf(word, word_dict) * idf(word, word_count, weight)
# 计算TF
def tf(word, word_dict):
return word_dict[word] / sum(word_dict.values())
# 输出结果
p = 1
for word_dict in word_count:
print("part:{}".format(p))
p += 1
for word, cnt in word_dict.items():
print("word: {} ---- TF-IDF:{}".format(word, tfidf(word, word_dict, word_count, weight)))
print("word: {} ---- TF:{}".format(word, tf(word, word_dict)))
print("word: {} ---- IDF:{}".format(word, idf(word, word_count, weight)))
print("word: {} ---- count_sentence:{}".format(word, count_sentence(word, word_count)))
```
输出结果如下:
```
part:1
word: 二价 ---- TF-IDF:0.0
word: 二价 ---- TF:0.42857142857142855
word: 二价 ---- IDF:0
word: 二价 ---- count_sentence:2
word: 四价 ---- TF-IDF:0.0
word: 四价 ---- TF:0.2857142857142857
word: 四价 ---- IDF:0
word: 四价 ---- count_sentence:2
word: 预约 ---- TF-IDF:0.0
word: 预约 ---- TF:0.14285714285714285
word: 预约 ---- IDF:0
word: 预约 ---- count_sentence:3
part:2
word: 四价 ---- TF-IDF:0.0
word: 四价 ---- TF:0.375
word: 四价 ---- IDF:0
word: 四价 ---- count_sentence:2
word: 九价 ---- TF-IDF:0.0
word: 九价 ---- TF:0.25
word: 九价 ---- IDF:0
word: 九价 ---- count_sentence:2
word: 预约 ---- TF-IDF:0.0
word: 预约 ---- TF:0.125
word: 预约 ---- IDF:0
word: 预约 ---- count_sentence:3
part:3
word: 九价 ---- TF-IDF:0.0
word: 九价 ---- TF:0.5714285714285714
word: 九价 ---- IDF:0
word: 九价 ---- count_sentence:4
word: 预约 ---- TF-IDF:0.0
word: 预约 ---- TF:0.25
word: 预约 ---- IDF:0
word: 预约 ---- count_sentence:3
```