H.264 AVC 视频压缩标准中的上下文自适应二进制算术编码

需积分: 9 177 浏览量更新于2024-08-02 收藏 806KB PDF 举报

"Context-Based Adaptive Binary Arithmetic Coding在H.264 AVC视频压缩标准中的应用" 在视频编码领域，H.264/AVC（Advanced Video Coding）标准因其高效的压缩性能而广受赞誉。其中，Context-Based Adaptive Binary Arithmetic Coding（CABAC）是该标准的一个核心组成部分，它在减少冗余并提升编码效率方面发挥了重要作用。本文由Detlev Marpe、Heiko Schwarz和Thomas Wiegand撰写，发表于2003年7月的《IEEE Transactions on Circuits and Systems for Video Technology》杂志上，深入探讨了CABAC的原理和实现方法。 CABAC是一种基于上下文的自适应二进制算术编码技术。其基本思想是结合上下文建模和自适应算术编码，以实现高度的自适应性和冗余数据的减少。在H.264/AVC标准中，CABAC用于熵编码，这一阶段的目标是对视频编码过程中的语法元素进行高效编码，如运动矢量、宏块模式、量化步长等。 CABAC的工作流程主要包括以下几个步骤： 1. 上下文建模：编码器根据已编码的数据来确定当前比特的上下文。这个上下文会影响编码的概率模型，从而影响编码决策。 2. 概率估计：CABAC使用一种低复杂度的方法估计每个比特被编码为0或1的概率。这些概率会随着编码过程动态更新，以反映数据流的变化。 3. 二进制算术编码：算术编码是一种熵编码技术，它将连续概率分布转换为连续的编码值。CABAC的自适应性体现在它可以根据上下文调整概率分布，以优化编码效率。 4. 低复杂度实现：CABAC设计时考虑了硬件和软件实现的效率，因此其算法复杂度相对较低，这使得在实际设备中进行快速且高效的解码成为可能。文章指出，CABAC相比于H.264/AVC的基线熵编码方法（例如，Context-Adaptive Variable Length Coding, CA-VLC），在各种应用场景下都能提供显著的性能提升。对于代表广播应用中典型素材的一系列测试序列，当视频质量介于30到38分贝之间时，平均比特率节省达到了9%到14%。关键词：二进制算术编码，上下文适应，视频压缩，H.264/AVC标准，熵编码，冗余减少，硬件实现，软件实现。

MARPE et al.: CABAC IN THE H.264/AVC VIDEO COMPRESSION STANDARD 623

frequently observed bins can be treated using a joint, typically

zero-order, probability model. Compared to the conventional

approach of using context models in the original domain of the

source with typically large alphabet size (like e.g., components

of motion vector differences or transform coefficient levels) this

additional freedom in the design offers a flexible instrument for

using higher order conditional probabilities without suffering

from context “dilution” effects. These effects are often observed

in cases, where a large number of conditional probabilities have

to be adaptively estimated on a relatively small (coding) time

interval, such that there are not enough samples to reach a reli-

able estimate for each model.

For instance, when operating in the original alphabet domain,

aquitemoderatelychosensecond-order model for agivensyntax

element alphabet of size

will result in the intractably

large number of

symbol probabilities to

be estimated for that particular syntax element only. Even for a

zero-order model, the task of tracking 255 individual probability

estimates according to the previous example is quite demanding.

However, typically measured probability density functions (pdf)

of prediction residuals or transformed prediction errors can be

modeled by highly peaked Laplacian, generalized Gaussian or

geometric distributions [28], where it is reasonable to restrict the

estimationofindividualsymbolstatisticstotheareaofthelargest

statistical variations at the peak of the pdf. Thus, if, for instance,

a binary tree resulting from a Huffman code design would be

chosen as a binarization for such a source and its related pdf,

only the nodes located in the vicinity of the root node would

be natural candidates for being modeled individually, whereas a

joint model would be assigned to all nodes on deeper tree levels

corresponding to the “tail” of the pdf. Note that this design is

different from the example given in Fig. 2, where each (internal)

node has its own model.

In the CABAC framework, typically only the root node would

be modeled using higher order conditional probabilities. In the

above example of a second-order model, this would result in

only four different binary probability models instead of

dif-

ferent

-ary probability models with .

2) Design of CABAC Binarization Schemes: As already

indicated above, a binary representation for a given nonbinary

valued syntax element provided by the binarization process

should be close to a minimum-redundancy code. On the one

hand, this allows easy access to the most probable symbols by

means of the binary decisions located at or close to the root

node for the subsequent modeling stage. On the other hand,

such a code tree minimizes the number of binary symbols to

encode on the average, hence minimizing the computational

workload induced by the binary arithmetic coding stage.

However, instead of choosing a Huffman tree for a given

training sequence, the design of binarization schemes in

CABAC (mostly) relies on a few basic code trees, whose struc-

ture enables a simple on-line computation of all code words

without the need for storing any tables. There are four such

basic types: the unary code, the truncated unary code, the

order Exp-Golomb code, and the fixed-length code. In addition,

there are binarization schemes based on a concatenation of

A more rigorous treatment of that problem can be found in [23] and [24]

Fig. 3. Construction of

th order EGk code for a given unsigned integer

symbol

these elementary types. As an exception of these structured

types, there are five specific, mostly unstructured binary trees

that have been manually chosen for the coding of macroblock

types and submacroblock types. Two examples of such trees

are shown in Fig. 2.

In the remaining part of this section, we explain in more detail

the construction of the four basic types of binarization and its

derivatives.

Unary and Truncated Unary Binarization Scheme: For each

unsigned integer valued symbol

, the unary code word

in CABAC consists of

“1” bits plus a terminating “0” bit. The

truncated unary (TU) code is only definedfor

with ,

where for

the code is given by the unary code, whereas

for

the terminating “0” bit is neglected such that the TU

code of

is given by a codeword consisting of “1” bits

only.

th order Exp-Golomb Binarization Scheme: Exponential

Golomb codes were first proposed by Teuhola [29] in the

context of run-length coding schemes. This parameterized

family of codes is a derivative of Golomb codes, which have

been proven to be optimal prefix-free codes for geometrically

distributed sources [30]. Exp-Golomb codes are constructed by

a concatenation of a prefix and a suffix code word. Fig. 3 shows

the construction of the

th order Exp-Golomb (EGk) code

word for a given unsigned integer symbol

. The prefix part of

the EGk code word consists of a unary code corresponding to

the value of

The EGk suffix part is computed as the binary representation

using significant bits, as can be

seen from the pseudo-C code in Fig. 3.

Consequently, for the EGk binarization, the number of sym-

bols having the same code length of

is ge-

ometrically growing. By inverting Shannon’s relationship be-

tween ideal code length and symbol probability, we can, e.g.,

easily deduce that EG0 is the optimal code for a pdf

with . This implies that for an appropri-

ately chosen parameter

, the EGk code represents a fairly good

first-order approximation of the ideal prefix-free code for tails

of typically observed pdf’s, at least for syntax elements that are

representing prediction residuals.

Fixed-Length (FL) Binarization Scheme: For the application

of FL binarization, a finite alphabet of values of the corre-

sponding syntax element is assumed. Let

denote a given

value of such a syntax element, where

. Then, the

剩余16页未读，继续阅读

扮猪喂老虎

粉丝: 9
资源: 3

H.264 AVC 视频压缩标准中的上下文自适应二进制算术编码

context-Based Adaptive Binary Arithmetic Coding in the H.264/AVC Video

Context-based_adaptive_binary_arithmetic_coding.ra_264_H.264_H.2

The H.264 Advanced Video Compression Standard

H.264.And.MPEG-4.Video.Compression.Video.Coding.For.Next.Generation.Multimedia

H.264 And MPEG-4 Video Compression Video Coding For Next Generation Multimedia

白皮书h.264 英文版

H.264/MPEG-4 AVC视频压缩标准解析

H.264与MPEG-4视频编码技术解析

H.264与MPEG-4视频压缩技术解析

H.264与MPEG-4视频压缩技术对比

最新资源