首页哈夫曼编码统计字符频率

哈夫曼编码统计字符频率

时间: 2023-12-22 16:27:53 浏览: 99

哈夫曼编码是一种可变长度编码，它通过统计字符出现的频率来构建一棵前缀编码树，从而实现对每个字符的编码。下面是统计字符频率并构建哈夫曼编码的Python代码示例： ```python import heapq from collections import defaultdict def huffman_encoding(data): # 统计字符频率 freq = defaultdict(int) for char in data: freq[char] += 1 # 构建哈夫曼编码树 heap = [[weight, [char, ""]] for char, weight in freq.items()] heapq.heapify(heap) while len(heap) > 1: lo = heapq.heappop(heap) hi = heapq.heappop(heap) for pair in lo[1:]: pair[1] = '0' + pair[1] for pair in hi[1:]: pair[1] = '1' + pair[1] heapq.heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:]) # 构建编码表 encoding_table = dict(heapq.heappop(heap)[1:]) return encoding_table # 示例 data = "hello world" encoding_table = huffman_encoding(data) print(encoding_table) ``` 上述代码中，我们使用了Python的heapq和collections模块来实现哈夫曼编码的构建。首先，我们使用defaultdict(int)来统计字符频率，然后使用heapq.heapify()将字符频率转换为堆，接着使用堆来构建哈夫曼编码树。最后，我们使用字典来存储每个字符的编码，即为编码表。

阅读全文