首页用python统计一个txt中每个汉字的出现次数和频率，并根据频率求出该txt的信息熵

用python统计一个txt中每个汉字的出现次数和频率，并根据频率求出该txt的信息熵

时间: 2024-01-09 22:04:24 浏览: 25

好的，我理解了你的问题。以下是用 Python 统计一个 txt 中每个汉字的出现次数和频率，并根据频率求出该 txt 的信息熵的代码： ```python import os import math def count_chars(filepath): with open(filepath, 'r', encoding='utf-8') as f: text = f.read() freq_dict = {} total_chars = 0 for char in text: if char not in freq_dict: freq_dict[char] = 1 else: freq_dict[char] += 1 total_chars += 1 freq_list = sorted(freq_dict.items(), key=lambda x: x[1], reverse=True) return freq_list, total_chars def calc_entropy(freq_list, total_chars): entropy = 0.0 for char, freq in freq_list: p = freq / total_chars entropy -= p * math.log2(p) return entropy if __name__ == '__main__': filepath = 'test.txt' # 替换成你的 txt 文件路径 freq_list, total_chars = count_chars(filepath) print('每个汉字的出现次数和频率：') for char, freq in freq_list: print(char, ':', freq, ', 频率:', freq / total_chars) entropy = calc_entropy(freq_list, total_chars) print('信息熵:', entropy) ``` 首先定义了一个 `count_chars` 函数，用于统计每个汉字的出现次数和频率。该函数接收一个 txt 文件路径作为参数，返回一个按照频率从高到低排列的字典列表和总字符数。接着定义了一个 `calc_entropy` 函数，用于根据频率计算信息熵。该函数接收上一步得到的字典列表和总字符数作为参数，返回该 txt 的信息熵。最后在 `main` 函数中调用这两个函数，并输出每个汉字的出现次数和频率以及该 txt 的信息熵。请替换代码中的 `test.txt` 为你的 txt 文件路径，并在 Python 环境中运行。