PRE-PUBLICATION DRAFT, TO APPEAR IN IEEE TRANS. ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, DEC. 2012
Copyright (c) 2012 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the
IEEE by sending an email to pubs-permissions@ieee.org.
modes (compared to 8 such modes in H.264/MPEG-4
AVC), plus planar (surface fitting) and DC (flat) prediction
modes. The selected intra prediction modes are encoded by
deriving most probable modes (e.g. prediction directions)
based on those of previously-decoded neighboring PBs.
Quantization control: As in H.264/MPEG-4 AVC,
uniform reconstruction quantization (URQ) is used in
HEVC, with quantization scaling matrices supported for
the various transform block sizes.
Entropy coding: Context adaptive binary arithmetic
coding (CABAC) is used for entropy coding. This is
similar to the CABAC scheme in H.264/MPEG-4 AVC,
but has undergone several improvements to improve its
throughput speed (especially for parallel-processing
architectures) and its compression performance, and to
reduce its context memory requirements.
In-loop deblocking filtering (DF): A deblocking filter
(DF) similar to the one used in H.264/MPEG-4 AVC is
operated in the inter-picture prediction loop. However, the
design is simplified in regard to its decision-making and
filtering processes, and is made more friendly to parallel
processing.
Sample adaptive offset (SAO): A non-linear amplitude
mapping is introduced in the inter-picture prediction loop
after the deblocking filter. The goal is to better reconstruct
the original signal amplitudes by using a look-up table that
is described by a few additional parameters that can be
determined by histogram analysis at the encoder side.
B. High-level syntax architecture
A number of design aspects new to the HEVC standard
improve flexibility for operation over a variety of applications
and network environments and improve robustness to data
losses. However, the high-level syntax architecture used in the
H.264/MPEG-4 AVC standard has generally been retained,
including the following features:
Parameter set structure: Parameter sets contain
information that can be shared for the decoding of several
regions of the decoded video. The parameter set structure
provides a secure mechanism for conveying data that are
essential to the decoding process. The concepts of
sequence and picture parameter sets from H.264/MPEG-4
AVC are augmented by a new video parameter set (VPS)
structure.
NAL unit syntax structure: Each syntax structure is
placed into a logical data packet called a network
abstraction layer (NAL) unit. Depending on the content of
a two-byte NAL unit header, it is possible to readily
identify the purpose of the associated payload data.
Slices: A slice is a data structure that can be decoded
independently from other slices of the same picture, in
terms of entropy coding, signal prediction, and residual
signal reconstruction. (This describes ordinary slices; an
alternative form known as dependent slices is discussed
below.) A slice can either be an entire picture or a region of
a picture. One of the main purposes of slices is
re-synchronization in the event of data losses. In the case
of packetized transmission, the maximum number of
payload bits within a slice is typically restricted, and the
number of CTUs in the slice is often varied to minimize the
packetization overhead while keeping the size of each
packet within this bound.
SEI and VUI metadata: The syntax includes support for
various types of metadata known as supplemental
enhancement information (SEI), video usability
information (VUI). Such data provides information about
the timing of the video pictures, the proper interpretation of
the color space used in the video signal, 3D stereoscopic
frame packing information, other “display hint”
information, etc.
C. Parallel decoding syntax and modified slice structuring
Finally, four new features are introduced in the HEVC
standard to enhance parallel processing capability or modify
the structuring of slice data for packetization purposes. Each of
them may have benefits in particular application contexts, and
it is generally up to the implementer of an encoder or decoder to
determine whether and how to take advantage of these features.
Tiles: The option to partition a picture into rectangular
regions called tiles has been specified. The main purpose
of tiles is to increase the capability for parallel processing
rather than provide error resilience. Tiles are
independently-decodable regions of a picture that are
encoded with some shared header information. Therefore,
they could additionally be used for the purpose of random
access to local regions of video pictures. A typical tile
configuration of a picture consists of segmenting the
picture into rectangular regions with approximately equal
numbers of CTUs in each tile. Tiles provide parallelism at
a more coarse level (picture/sub-picture) of granularity,
and no sophisticated synchronization of threads is
necessary for their use.
Wavefront parallel processing: When wavefront parallel
processing (WPP) is enabled, a slice is divided into rows of
CTUs. The first row is processed in an ordinary way; the
second row can begin to be processed after only a few
decisions have been made in the first row; the third row can
begin to be processed after only a few decisions have been
made in the second row; etc. The context models of the
entropy coder in each row are inferred from those in the
preceding row with a small fixed processing lag. WPP
provides a form of processing parallelism at a rather fine
level of granularity, i.e. within a slice. WPP may often
provide better compression performance than tiles (and
avoid some visual artifacts that may be induced by tiles).