HEVC编码标准复杂度与实现分析

需积分: 10 28 浏览量更新于2024-09-09 收藏 602KB PDF 举报

"HEVC(高效率视频编码,High Efficiency Video Coding)是新一代的视频编码标准，旨在相较于H.264/AVC High Profile提供双倍的编码效率，即在相同的视频质量下，只需一半的比特率。这篇论文对HEVC在标准化过程中的复杂性考虑进行了阐述，并对参考软件和优化软件进行了性能分析，以揭示HEVC相对于其前代的复杂性和简化之处。在HEVC的开发过程中，考虑到的主要复杂性因素包括新的编码工具和技术，如更精细的块划分结构（如四分之一像素和八分之一像素的运动估计）、多模式预测、更强的熵编码、以及更多的变换和量化选项。这些改进虽然提高了压缩效率，但也增加了编码和解码的计算复杂度。论文中提到的复杂性分析显示，HEVC解码器的复杂度与H.264/AVC解码器相比并没有显著差异。这意味着在当前的硬件环境下，HEVC的软件解码实现是可行且实用的。这得益于在设计HEVC时对解码复杂性的优化，确保了高效能与兼容性的平衡。然而，HEVC编码器的复杂度预计会比H.264/AVC编码器高出几倍，这是未来研究的重点。编码端的复杂性增加主要是由于更复杂的编码决策过程，如更深入的率失真优化、多参考帧选择、以及更高级别的自适应工具。这些都要求更高的计算资源，可能需要更先进的硬件支持或优化的编码算法来实现高效编码。此外，为了适应不同应用场景和设备，HEVC标准引入了多种配置级别（Profile），允许在保持兼容性的同时，根据设备能力调整编码复杂度。这使得HEVC能够广泛应用于从低功耗移动设备到高性能服务器的多种平台。总结来说，HEVC在提高视频压缩效率的同时，通过精心设计保持了解码端的相对简洁性，而编码端的复杂性提升则预示着编码技术的持续演进和对高性能解决方案的需求。对于开发者和研究人员而言，理解并优化HEVC的编码流程，尤其是在降低编码复杂性的同时保持编码质量，将是未来工作的重要方向。"

PRE-PUBLICATION DRAFT, TO APPEAR IN IEEE TRANS. ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DEC. 2012 3

This design strategy does not easily extend to larger transform

sizes, such as 16- and 32-point. HEVC thus takes a different

approach and simply deﬁnes transforms (of size 4×4, 8×8,

16×16, and 32×32) as straightforward ﬁxed-point matrix

multiplications. The matrix multiplications for the vertical and

horizontal component of the inverse transform are shown in

(1) and (2), respectively.

Y = s



· T



(1)

R = Y

· T (2)

where s() is a scaling and saturating function that guarantees

that values of Y can be represented using 16 bits. Each factor

in the transform matrix T is represented using signed 8-

bit numbers. Operations are deﬁned such that 16-bit signed

coefﬁcients C are multiplied with the factors and hence greater

than 16-bit accumulation is required. As the transforms are

integer approximations of a discrete cosine transform (DCT),

they retain the symmetry properties thereof, thereby enabling

a “partial butterﬂy” implementation. For the 4-point transform,

an alternative transform approximating a discrete sine trans-

form (DST) is also deﬁned.

Although there has been some concern about the imple-

mentation complexity of the 32-point transform, data given in

[7] indicates 158 cycles for an 8×8 inverse transform, 861

cycles for a 16×16 inverse transform, and 4696 cycles for a

32×32 inverse transform on an Intel processor. If normalizing

these values by the associated block sizes, respectively 2.47,

3.36, and 4.59 cycles are required per sample. The time cost

per sample of a 32×32 inverse transform is thus less than

twice that of an 8×8 inverse transform. Furthermore the cycle

count for larger transforms may often be reduced by taking

advantage of the fact that most high-frequency coefﬁcients

are typically zero. Determining which bounding subblock of

coefﬁcients is nonzero is facilitated by using a 4×4 coding

structure for the entropy coding of transform coefﬁcients. The

bounding subblock may thus be determined at a reasonable

granularity (4×4) without having to consider the position of

each nonzero coefﬁcient.

It should also be noted that the transform order is changed

with respect to H.264/AVC. HEVC deﬁnes a column-row order

for the inverse transform. Due to the regular uniform structure

of the matrix multiplication and partial butterﬂy designs, this

approach may be preferred in both hardware and software. In

software it is preferable to transform rows, as one entire row

of coefﬁcients may easily be held in registers (a row of thirty-

two 32-bit accumulators requires eight 128-bit registers which

is implementable on several architectures without register

spilling). This property is not necessarily maintained with

more irregular but fully decomposed transform designs, which

look attractive in terms of primitive operation counts, but

require a greater number of registers and software operations

to implement. As can be seen from (1), applying the transpose

to the coefﬁcients C allows implementations to transforms

rows only. Note that the transpose can be integrated in the

inverse scan without adding complexity.

E. Entropy coding

Unlike the H.264/AVC speciﬁcation that features CAVLC

and CABAC [8] entropy coders, HEVC deﬁnes CABAC as

the single entropy coding method. CABAC incorporates three

stages: binarization of syntax elements, context modeling and

binary arithmetic coding. While the acronym and the core

arithmetic coding engine remain the same as in H.264/AVC,

there are a number of differences in context modeling and

binarization as described below.

In the development of HEVC, a substantial amount of effort

has been devoted to reduce the number of contexts. While

version 1.0 of the HM featured in excess of 700 contexts,

version 8.0 has only 172. This number compares favorably to

H.264/AVC, where 299 contexts are used, assuming support

for frame coding in the 4:2:0 color format (Progressive High

proﬁle). 237 of these 299 contexts are involved in residual

signal coding whereas HEVC uses 127 of the 172 for this

purpose. When comparing the reduction of 46% in residual

coding with the reduction of 27% for the remaining syntax

elements, it becomes clear that most effort has been put into

reducing the number of contexts associated with the residual

syntax. This reduction in the number of contexts contributes to

lower the amount of memory required by the entropy decoder

and the cost of initializing the engine. Initialization values of

the states are deﬁned with 8 bits per context, reduced from 16

in H.264/AVC, thereby further reducing memory requirements.

One widely used method for determining contexts in

H.264/AVC is to use spatial neighborhood relationships. For

example, using the value above and to the left to derive a con-

text for the current block. In HEVC such spatial dependencies

have been mostly avoided such as to reduce the number of

line buffers.

Substantial effort has also been devoted to enable par-

allel context processing, where a decoder has the ability

to derive multiple context indices in parallel. These tech-

niques apply mostly to transform coefﬁcient coding, which

becomes the entropy decoding bottleneck at high bit rates.

One example is the modiﬁcation of the signiﬁcance map

coding. In H.264/AVC, two interleaved ﬂags are used to signal

whether the current coefﬁcient has a non-zero value (signiﬁ-

cant coeff ﬂag) and whether it is the last one in coding order

(last signiﬁcant coeff ﬂag). This makes it impossible to de-

rive the signiﬁcant coeff ﬂag and last signiﬁcant coeff ﬂag

contexts in parallel. HEVC breaks this dependency by ex-

plicitly signaling the horizontal and vertical offset of the last

signiﬁcant coefﬁcient in the current block before parsing the

signiﬁcant coeff ﬂags [9].

The burden of entropy decoding with context modeling

grows with bit rate as more bins need to be processed. There-

fore, the bin strings of large syntax elements are divided into

a preﬁx and a sufﬁx. All preﬁx bins are coded in regular mode

(i.e., using context modeling), whereas all sufﬁx bins are coded

in a bypass mode. The cost of decoding a bin in bypass mode

is lower than in regular mode. Furthermore, the ratio of bins

to bits is ﬁxed at 1:1 for bypass mode, whereas it is generally

higher for the regular mode. In H.264/AVC, motion vector

differences and transform coefﬁcient levels are binarized using

剩余11页未读，继续阅读

baidu_21244593

粉丝: 0

HEVC编码标准复杂度与实现分析

深入解析JVT3V-K1003：3D-HEVC与MV-HEVC的测试模型11

HEVC技术详解：算法与架构

Elecard HEVC Analyzer：免费HEVC码流分析工具

Performance Comparison of VVC, AV1, HEVC, and AVC

MV-HEVC and 3D-HEVC Reference Software 16.2

HEVC Coding Tools and Specification

Test Model 11 of 3D-HEVC and MV-HEVC.docx

Test Model 11 of 3D-HEVC and MV-HEVC： JVT3V-K1003

Implementation and Improvement of Wavefront Parallel Processing for HEVC Encoding on Many-core Platform

Optimizing the Hierarchical Prediction and Coding in HEVC for Surveillance and Conference Videos With Background Modeling

最新资源