哈夫曼编码和解码器的实现
时间: 2023-07-19 11:37:38 浏览: 41
哈夫曼编码和解码器是一种常用的无损数据压缩算法,下面是Python语言的实现:
1. 哈夫曼编码:
```python
import heapq
from collections import defaultdict
# 计算字符出现次数
def count_freq(text):
freq = defaultdict(int)
for char in text:
freq[char] += 1
return freq
# 构建哈夫曼树
def build_huffman_tree(freq):
heap = [[count, [char, ""]] for char, count in freq.items()]
heapq.heapify(heap)
while len(heap) > 1:
lo = heapq.heappop(heap)
hi = heapq.heappop(heap)
for pair in lo[1:]:
pair[1] = '0' + pair[1]
for pair in hi[1:]:
pair[1] = '1' + pair[1]
heapq.heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
return sorted(heapq.heappop(heap)[1:], key=lambda p: (len(p[-1]), p))
# 哈夫曼编码
def huffman_encoding(text):
freq = count_freq(text)
if len(freq) == 1:
return {'0': text}
huff_tree = build_huffman_tree(freq)
codes = {}
for char, code in huff_tree:
codes[char] = code
encoded_text = "".join([codes[char] for char in text])
return encoded_text, codes
# 测试
text = "hello world"
encoded_text, codes = huffman_encoding(text)
print(encoded_text) # 0100110111111110011000001100110100
print(codes) # {'h': '00', 'e': '101', 'l': '111', 'o': '100', ' ': '011', 'w': '0100', 'r': '0101', 'd': '1100'}
```
2. 哈夫曼解码:
```python
# 哈夫曼解码
def huffman_decoding(encoded_text, codes):
rev_dict = {v: k for k, v in codes.items()}
curr_code = ""
decoded_text = ""
for bit in encoded_text:
curr_code += bit
if curr_code in rev_dict:
decoded_text += rev_dict[curr_code]
curr_code = ""
return decoded_text
# 测试
decoded_text = huffman_decoding(encoded_text, codes)
print(decoded_text) # hello world
```
以上代码中,`count_freq()` 函数用于计算字符出现次数,`build_huffman_tree()` 函数用于构建哈夫曼树,`huffman_encoding()` 函数用于对文本进行哈夫曼编码,`huffman_decoding()` 函数用于对哈夫曼编码后的文本进行解码。