WIEGAND et al.: OVERVIEW OF THE H.264/AVC VIDEO CODING STANDARD 4
Robustness to data errors/losses and flexibility for operation
over a variety of network environments is enabled by a
number of design aspects new to the H.264/AVC standard
including the following highlighted features.
• Parameter set structure: The parameter set
design provides for robust and efficient
conveyance header information. As the loss of a
few key bits of information (such as sequence
header or picture header information) could have a
severe negative impact on the decoding process
when using prior standards, this key information
was separated for handling in a more flexible and
specialized manner in the H.264/AVC design.
• NAL unit syntax structure: Each syntax structure
in H.264/AVC is placed into a logical data packet
called a NAL unit. Rather than forcing a specific
bitstream interface to the system as in prior video
coding standards, the NAL unit syntax structure
allows greater customization of the method of
carrying the video content in a manner appropriate
for each specific network.
• Flexible slice size: Unlike the rigid slice structure
found in MPEG-2 (which reduces coding
efficiency by increasing the quantity of header data
and decreasing the effectiveness of prediction),
slice sizes in H.264/AVC are highly flexible, as
was the case earlier in MPEG-1.
• Flexible macroblock ordering (FMO): A new
ability to partition the picture into regions called
slice groups has been developed, with each slice
becoming an independently-decodable subset of a
slice group. When used effectively, flexible
macroblock ordering can significantly enhance
robustness to data losses by managing the spatial
relationship between the regions that are coded in
each slice. (FMO can also be used for a variety of
other purposes as well.)
• Arbitrary slice ordering (ASO): Since each slice
of a coded picture can be (approximately) decoded
independently of the other slices of the picture, the
H.264/AVC design enables sending and receiving
the slices of the picture in any order relative to
each other. This capability, first found in an
optional part of H.263+, can improve end-to-end
delay in real-time applications, particularly when
used on networks having out-of-order delivery
behavior (e.g., internet protocol networks).
• Redundant pictures: In order to enhance
robustness to data loss, the H.264/AVC design
contains a new ability to allow an encoder to send
redundant representations of regions of pictures,
enabling a (typically somewhat degraded)
representation of regions of pictures for which the
primary representation has been lost during data
transmission.
• Data Partitioning: Since some coded information
for representation of each region (e.g., motion
vectors and other prediction information) is more
important or more valuable than other information
for purposes of representing the video content,
H.264/AVC allows the syntax of each slice to be
separated into up to three different partitions for
transmission, depending on a categorization of
syntax elements. This part of the design builds
further on a path taken in MPEG-4 Visual and in
an optional part of H.263++. Here the design is
simplified by having a single syntax with
partitioning of that same syntax controlled by a
specified categorization of syntax elements.
• SP/SI synchronization/switching pictures: The
H.264/AVC design includes a new feature
consisting of picture types that allow exact
synchronization of the decoding process of some
decoders with an ongoing video stream produced
by other decoders without penalizing all decoders
with the loss of efficiency resulting from sending
an I picture. This can enable switching a decoder
between representations of the video content that
used different data rates, recovery from data losses
or errors, as well as enabling trick modes such as
fast-forward, fast-reverse, etc.
In the following two sections, a more detailed description of
the key features is given.
III. N
ETWORK ABSTRACTION LAYER
The network abstraction layer (NAL) is designed in order to
provide "network friendliness" to enable simple and
effective customization of the use of the VCL for a broad
variety of systems.
The NAL facilitates the ability to map H.264/AVC VCL
data to transport layers such as
• RTP/IP for any kind of real-time wire-line and wireless
Internet services (conversational and streaming)
• File formats, e.g. ISO MP4 for storage and MMS
• H.32X for wireline and wireless conversational
services
• MPEG-2 systems for broadcasting services, etc.
The full degree of customization of the video content to fit
the needs of each particular application is outside the scope
of the H.264/AVC standardization effort, but the design of
the NAL anticipates a variety of such mappings. Some key
concepts of the NAL are NAL units, byte stream, and
packet format uses of NAL units, parameter sets, and access
units. A short description of these concepts is given below
whereas a more detailed description including error
resilience aspects is provided in [6] and [7].
A. NAL units
The coded video data is organized into NAL units, each of
which is effectively a packet that contains an integer
number of bytes. The first byte of each NAL unit is a
header byte that contains an indication of the type of data in