Arithmetic Coding: From Theory to Practice

需积分: 9 146 浏览量更新于2024-07-20 收藏 310KB PDF 举报

"本文档是关于算术编码(Arithmetic Coding)的详细技术报告，由 McGill University 的 Sable Research Group 出版。算术编码是一种数据压缩算法，它通过利用概率模型来高效地编码信息。该文档由 Eric Bodden、Malte Clasen 和 Joachim Kneis 合著，旨在从理论到实践全面介绍算术编码。" 算术编码是一种数据压缩方法，其核心思想是基于概率模型将信息转换为连续的实数值表示，从而实现高效的编码。这种方法最初在1970年代被提出，并逐渐在各种压缩标准如JPEG、MPEG等中得到应用。 1. 动机与历史：算术编码的动机源于需要更有效地压缩数据，尤其是在通信和存储领域。相比早期的熵编码如霍夫曼编码，算术编码能够更精确地适应数据的概率分布，尤其适用于熵接近但不完全相等的情况。 2. 基础概念：算术编码的基础在于理解信息熵，即数据的平均信息量。信息熵是衡量数据不确定性的度量，通过概率分布可以计算得到。在编码过程中，信息熵决定了压缩效率的上限。 3. 编码与解码： - **编码**：编码器将每个符号映射到一个概率区间，然后根据符号的概率分布逐步缩小区间，最终得到一个代表整个输入序列的实数值。 - **解码**：解码器通过反向操作恢复原始序列，通过区间划分和概率信息确定每个符号。 4. 实数表示： - **区间创建**：编码过程通常开始于一个全范围的区间 [0, 1)，每个符号对应一个子区间，这些子区间的大小反映了符号出现的概率。 - **上下界**：编码时，根据输入序列选择相应的子区间并更新区间边界。 - **唯一性**：算术编码保证了编码的唯一性，即不同的输入序列对应不同的输出实数。 5. 位序列表示： - **动机**：为了实际存储和传输，需要将连续的实数值转化为有限的二进制位序列。 - **抽象化**：通过一系列的位操作，如移位和比较，将实数值转换成位流，同时保持解码的正确性。 6. 总结：文档详细介绍了算术编码的工作原理，从理论基础到实际操作，包括编码和解码的具体步骤，以及如何保证编码的唯一性和效率。此外，还探讨了如何将编码后的实数值转换为位序列进行存储和传输。算术编码是一种高级的数据压缩技术，它通过精确的概率建模和区间操作，实现了对数据的有效压缩。这种技术在现代数据通信和存储系统中扮演着重要角色。

When encoding this sequence, we can do so in a very naive way by simply using 2 bits per symbol,

{00,01,10,11}, which leads to overall costs of 8∗ 2 bits = 16 bits. So what about the entropy of

(S)?

∑

s∈{a,b,c,d}

P(s) ld

(s)

= (0,5· ld

0,5

) + (0,25· ld

0,25

)

+(0,125· ld

0,125

) + (0,125· ld

0,125

)

= 0,5· ld 2+ 0,25· ld 4+ 0,125· ld 8+ 0,125· ld 8

= 0,5+ 0,5+ 0,375+ 0,375

= 1,75 [Bits/Symbol]

Note that this is given in [Bits/Symbol], which means that we need a minimum of 8∗1,75 = 14 bits

to encode the whole input sequence. We cannot do any better.

This gives a saving of 16− 14 = 2

bits.

However, what would have happened if we had not been so lucky to guess the correct probability

distribution on advance? Have a look at the following model M

with P

(a) = 0,125, P

(b) =

0,125, P

(d) = 0,25. The entropy under M

calculates to:

∑

s∈{a,b,c,d}

P(s) ld

(s)

= (0,5· ld

0,125

) + (0,25· ld

0,125

)

+(0,125· ld

0,5

) + (0,125· ld

0,25

)

= 0,5· ld 8+ 0,25· ld 8+ 0,125· ld 2+ 0,125· ld 4

= 1,5+ 0,75+ 0,125+ 0,25

= 2,625 [Bits/Symbol]

We should see this example as a warning. A warning, not to mix up the notion of coding with

compression. The reason for this is that we can see that under the model M

, we would be required

to use 2,625∗ 8 = 21 bits to encode the input sequence. However, this would be no compression at

all, if one remembers that our naive encoding with 2 bits per symbol employed 16 bits altogether

only. Also we can conclude that the compression ration can only be as good as the underlying

model allows. The better the model matches the reality, the better the compression will be.

However, in the following chapters we will prove, that given any particular model (that on its own

might be as optimal as it can be), Arithmetic Coding achieves the absolutely best compression

ratio, meaning that no other algorithm could do any better under the very same model.

Note that we do not prove the entropy as measure of optimality here. This fact is commonly known as the Shannon

Theorem[WS49].

Since we now stirred up your interest so much, we are now going to describe the actual encoding

and decoding algorithms.

2.3 Encoder and decoder

DEFINITION 6 (ENCODER & DECODER)

An algorithm which encodes a sequence is called an ENCODER. The appropriate algorithm de-

coding the sequence again is called a DECODER.

In opposite to the input sequence S we refer to the encoded sequence which is output of the

encoder and input for the decoder byCode(S) or C(S) for short.The application of both algorithms

is referred to as ENCODING respectively DECODING.

We want to emphasize that we use the notion of an algorithm in its most natural way, meaning a

general sequence of steps performed by any arbitrary computer. By purpose we do not limit our-

selves to a certain implementation at this stage. An encoder could be any algorithm transforming

the input in such a way that there is a decoder to reproduce the raw input data. However at the

end of this paper we present the full C++ source code of a encoder/decoder pair (also referred to

as CODEC), which employs Arithmetic Coding. The following code examples are taken from this

reference implementation.

In the theory of data compression one often distinguishes between lossy and lossless compression

algorithms. Especially analogous signals are often encoded in a lossy way because such data is in

the end meant to be interpreted by some kind of human organ (eye, ear,...) and such organs are

very limited in a sense that they simply do not recognize certain levels of noise or distortion at

all. Of course lossy compression algorithms can reach better compression ratios by losing some

accuracy. However we are not going to consider any lossy compression in this article and rather

concentrate on lossless compression, that can be applied to all kinds of data in general. Thus we

are only going to consider codecs that are able to reproduce the input data up to the last symbol.

In a nutshell our resulting Code(S) will be proven lossless and optimal.

2.4 The notions of uniqueness and efﬁciency

DEFINITION 7 (UNIQUE DECODABILITY)

We call a code UNIQUELY DECODABLE, if any sequence is mapped to its code in an injective way.

If this is the case one can determine the unique input symbol for any given code.

A special class of uniquely decodable codes are so-called preﬁx codes. These can be characterized

by the property that no codeword is a preﬁx of any other codeword:

DEFINITION 8 (PREFIX CODE)

We call a given code C a PREFIX CODE, if for no pair (x,y) of symbols of the alphabet, C(x) is

preﬁx of C(y).

Preﬁx codes have the big advantage that as soon as the decoder has read C(x) for a certain x,

it knows at ones that the code is terminated and that symbol x was encoded. In the case of an

arbitrary code, it could be the case that the decoder would have to read on in order to see if C(x)

was probably only the preﬁx of another code C(y). Thus, preﬁx codes are known to be a class

剩余59页未读，继续阅读

turbo0708

粉丝: 0
资源: 7

Arithmetic Coding: From Theory to Practice

Arithmetic

arithmetic111.rar

arith.zip_arith_arith_coder.tar.Z_arithmetic coding_zip

算术架构设计经典Computer Arithmetic.pdf

signed Arithmetic in Verilog.pdf

Arithmetic Progressions.cpp

ArithmeticException.md

计算机组成与结构：lecture 9 Computer Arithmetic.pdf

A guide to convolution arithmetic for deep.pdf

计算机组成与结构体系英文课件：Chapter9 Arithmetic.pdf

最新资源