1652 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 12, DECEMBER 2012
H.264/MPEG-4 AVC). Similar to H.264/MPEG-4 AVC,
multiple reference pictures are used. For each PB, either
one or two motion vectors can be transmitted, resulting
either in unipredictive or bipredictive coding, respec-
tively. As in H.264/MPEG-4 AVC, a scaling and offset
operation may be applied to the prediction signal(s) in
a manner known as weighted prediction.
7) Intrapicture prediction: The decoded boundary samples
of adjacent blocks are used as reference data for spa-
tial prediction in regions where interpicture prediction
is not performed. Intrapicture prediction supports 33
directional modes (compared to eight such modes in
H.264/MPEG-4 AVC), plus planar (surface fitting) and
DC (flat) prediction modes. The selected intrapicture
prediction modes are encoded by deriving most probable
modes (e.g., prediction directions) based on those of
previously decoded neighboring PBs.
8) Quantization control: As in H.264/MPEG-4 AVC, uni-
form reconstruction quantization (URQ) is used in
HEVC, with quantization scaling matrices supported for
the various transform block sizes.
9) Entropy coding: Context adaptive binary arithmetic cod-
ing (CABAC) is used for entropy coding. This is sim-
ilar to the CABAC scheme in H.264/MPEG-4 AVC,
but has undergone several improvements to improve
its throughput speed (especially for parallel-processing
architectures) and its compression performance, and to
reduce its context memory requirements.
10) In-loop deblocking filtering: A deblocking filter similar
to the one used in H.264/MPEG-4 AVC is operated
within the interpicture prediction loop. However, the
design is simplified in regard to its decision-making and
filtering processes, and is made more friendly to parallel
processing.
11) Sample adaptive offset (SAO): A nonlinear amplitude
mapping is introduced within the interpicture prediction
loop after the deblocking filter. Its goal is to better
reconstruct the original signal amplitudes by using a
look-up table that is described by a few additional
parameters that can be determined by histogram analysis
at the encoder side.
B. High-Level Syntax Architecture
A number of design aspects new to the HEVC standard
improve flexibility for operation over a variety of applications
and network environments and improve robustness to data
losses. However, the high-level syntax architecture used in
the H.264/MPEG-4 AVC standard has generally been retained,
including the following features.
1) Parameter set structure: Parameter sets contain informa-
tion that can be shared for the decoding of several re-
gions of the decoded video. The parameter set structure
provides a robust mechanism for conveying data that are
essential to the decoding process. The concepts of se-
quence and picture parameter sets from H.264/MPEG-4
AVC are augmented by a new video parameter set (VPS)
structure.
2) NAL unit syntax structure: Each syntax structure is
placed into a logical data packet called a network
abstraction layer (NAL) unit. Using the content of a two-
byte NAL unit header, it is possible to readily identify
the purpose of the associated payload data.
3) Slices: A slice is a data structure that can be decoded
independently from other slices of the same picture, in
terms of entropy coding, signal prediction, and residual
signal reconstruction. A slice can either be an entire
picture or a region of a picture. One of the main
purposes of slices is resynchronization in the event of
data losses. In the case of packetized transmission, the
maximum number of payload bits within a slice is
typically restricted, and the number of CTUs in the slice
is often varied to minimize the packetization overhead
while keeping the size of each packet within this bound.
4) Supplemental enhancement information (SEI) and video
usability information (VUI) metadata: The syntax in-
cludes support for various types of metadata known as
SEI and VUI. Such data provide information about the
timing of the video pictures, the proper interpretation of
the color space used in the video signal, 3-D stereoscopic
frame packing information, other display hint informa-
tion, and so on.
C. Parallel Decoding Syntax and Modified Slice Structuring
Finally, four new features are introduced in the HEVC stan-
dard to enhance the parallel processing capability or modify
the structuring of slice data for packetization purposes. Each
of them may have benefits in particular application contexts,
and it is generally up to the implementer of an encoder or
decoder to determine whether and how to take advantage of
these features.
1) Tiles: The option to partition a picture into rectangular
regions called tiles has been specified. The main pur-
pose of tiles is to increase the capability for parallel
processing rather than provide error resilience. Tiles are
independently decodable regions of a picture that are
encoded with some shared header information. Tiles can
additionally be used for the purpose of spatial random
access to local regions of video pictures. A typical
tile configuration of a picture consists of segmenting
the picture into rectangular regions with approximately
equal numbers of CTUs in each tile. Tiles provide
parallelism at a more coarse level of granularity (pic-
ture/subpicture), and no sophisticated synchronization of
threads is necessary for their use.
2) Wavefront parallel processing: When wavefront parallel
processing (WPP) is enabled, a slice is divided into
rows of CTUs. The first row is processed in an ordinary
way, the second row can begin to be processed after
only two CTUs have been processed in the first row,
the third row can begin to be processed after only
two CTUs have been processed in the second row,
and so on. The context models of the entropy coder
in each row are inferred from those in the preceding
row with a two-CTU processing lag. WPP provides a
form of processing parallelism at a rather fine level of