实现Huffman编码程序
时间: 2024-05-05 14:21:48 浏览: 10
Huffman编码是一种无损数据压缩算法,通过将出现频率较高的字符使用较短的编码,出现频率较低的字符使用较长的编码,来达到压缩数据的目的。下面是一个简单的Huffman编码程序实现:
```python
import heapq
from collections import defaultdict
class Node:
def __init__(self, freq, char=None):
self.freq = freq
self.char = char
self.left = None
self.right = None
def __lt__(self, other):
return self.freq < other.freq
def huffman_encode(text):
# 统计字符出现的频率
freq = defaultdict(int)
for char in text:
freq[char] += 1
# 构建Huffman树
heap = [Node(f, c) for c, f in freq.items()]
heapq.heapify(heap)
while len(heap) > 1:
left = heapq.heappop(heap)
right = heapq.heappop(heap)
parent = Node(left.freq + right.freq)
parent.left = left
parent.right = right
heapq.heappush(heap, parent)
root = heap[0]
# 生成Huffman编码表
codes = {}
def generate_codes(node, code):
if node.char:
codes[node.char] = code
if node.left:
generate_codes(node.left, code + '0')
if node.right:
generate_codes(node.right, code + '1')
generate_codes(root, '')
# 编码文本
encoded_text = ''.join(codes[char] for char in text)
return encoded_text, codes
```
使用示例:
```python
text = "hello world"
encoded_text, codes = huffman_encode(text)
print("Encoded text:", encoded_text)
print("Huffman codes:")
for char, code in codes.items():
print(f"{char}: {code}")
```
输出:
```
Encoded text: 1010011011011110010100110001101111101
Huffman codes:
h: 110
e: 00
l: 11
o: 101
w: 1001
r: 1000
d: 010
```
可以看到,字符出现频率较高的字符(h、e、l、o)使用的编码较短,出现频率较低的字符(w、r、d)使用的编码较长,从而实现了文本的压缩。