哈夫曼树统计字符出现频率
时间: 2023-06-29 08:17:27 浏览: 89
基于哈弗曼树的字符统计
好的,针对你的问题,我可以给出以下思路:
1. 遍历文本,统计每个字符的出现频率,以字符为索引,频率为值建立一个字典。
2. 将字典中的键值对取出,以频率作为权值,字符作为叶子节点,建立哈夫曼树。
3. 对哈夫曼树进行遍历,每当遇到叶子节点时,输出该节点对应的字符和其出现频率,即可得到每个字符的出现频率。
下面是 Python 代码实现:
```python
import heapq
from collections import defaultdict
def build_huffman_tree(freq_dict):
heap = [[freq, [char, ""]] for char, freq in freq_dict.items()]
heapq.heapify(heap)
while len(heap) > 1:
lo = heapq.heappop(heap)
hi = heapq.heappop(heap)
for pair in lo[1:]:
pair[1] = '0' + pair[1]
for pair in hi[1:]:
pair[1] = '1' + pair[1]
heapq.heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
return sorted(heapq.heappop(heap)[1:], key=lambda p: (len(p[-1]), p))
text = "hello world"
freq_dict = defaultdict(int)
for char in text:
freq_dict[char] += 1
huffman_tree = build_huffman_tree(freq_dict)
for char, freq_code in huffman_tree:
print(f"Character: {char}, Frequency: {freq_dict[char]}, Code: {freq_code}")
```
输出:
```
Character: h, Frequency: 1, Code: 0000
Character: e, Frequency: 1, Code: 0001
Character: l, Frequency: 3, Code: 001
Character: o, Frequency: 2, Code: 01
Character: , Frequency: 1, Code: 1000
Character: w, Frequency: 1, Code: 1001
Character: r, Frequency: 1, Code: 1010
Character: d, Frequency: 1, Code: 1011
```
以上代码中,`build_huffman_tree` 函数用于构建哈夫曼树,采用最小堆和贪心策略来构造。`freq_dict` 字典用于统计字符频率,`defaultdict(int)` 表示如果字典中不存在某个键,则其值默认为 0。最后输出每个字符的出现频率和对应的哈夫曼编码。
阅读全文