ISO/IEC 14496-10:2005(E)
© ISO/IEC 2005 – All rights reserved
xvii
ITU-T Rec. H.264 | ISO/IEC 14496-10 version 4 (the current specification) refers to the integrated text containing the
first technical corrigendum (2004), the first amendment (the "Fidelity range extensions"), and an additional technical
corrigendum (2005). In the ITU-T, the next published version after version 2 was version 4 (due to the completion of the
drafting work for version 4 prior to the approval opportunity for a final version 3 text).
0.5 Profiles and levels
This subclause does not form an integral part of this Recommendation | International Standard.
This Recommendation | International Standard is designed to be generic in the sense that it serves a wide range of
applications, bit rates, resolutions, qualities, and services. Applications should cover, among other things, digital storage
media, television broadcasting and real-time communications. In the course of creating this International Standard,
various requirements from typical applications have been considered, necessary algorithmic elements have been
developed, and these have been integrated into a single syntax. Hence, this International Standard will facilitate video
data interchange among different applications.
Considering the practicality of implementing the full syntax of this International Standard, however, a limited number of
subsets of the syntax are also stipulated by means of "profiles" and "levels". These and other related terms are formally
defined in clause 3.
A "profile" is a subset of the entire bitstream syntax that is specified by this Recommendation | International Standard.
Within the bounds imposed by the syntax of a given profile it is still possible to require a very large variation in the
performance of encoders and decoders depending upon the values taken by syntax elements in the bitstream such as the
specified size of the decoded pictures. In many applications, it is currently neither practical nor economic to implement a
decoder capable of dealing with all hypothetical uses of the syntax within a particular profile.
In order to deal with this problem, "levels" are specified within each profile. A level is a specified set of constraints
imposed on values of the syntax elements in the bitstream. These constraints may be simple limits on values.
Alternatively they may take the form of constraints on arithmetic combinations of values (e.g. picture width multiplied by
picture height multiplied by number of pictures decoded per second).
Coded video content conforming to this Recommendation | International Standard uses a common syntax. In order to
achieve a subset of the complete syntax, flags, parameters, and other syntax elements are included in the bitstream that
signal the presence or absence of syntactic elements that occur later in the bitstream.
0.6 Overview of the design characteristics
This subclause does not form an integral part of this Recommendation | International Standard.
The coded representation specified in the syntax is designed to enable a high compression capability for a desired image
quality. With the exception of the transform bypass mode of operation for lossless coding in the High 4:4:4 profile and
the I_PCM mode of operation in all profiles, the algorithm is typically not lossless, as the exact source sample values are
typically not preserved through the encoding and decoding processes. A number of techniques may be used to achieve
highly efficient compression. Encoding algorithms (not specified in this Recommendation | International Standard) may
select between inter and intra coding for block-shaped regions of each picture. Inter coding uses motion vectors for
block-based inter prediction to exploit temporal statistical dependencies between different pictures. Intra coding uses
various spatial prediction modes to exploit spatial statistical dependencies in the source signal for a single picture.
Motion vectors and intra prediction modes may be specified for a variety of block sizes in the picture. The prediction
residual is then further compressed using a transform to remove spatial correlation inside the transform block before it is
quantised, producing an irreversible process that typically discards less important visual information while forming a
close approximation to the source samples. Finally, the motion vectors or intra prediction modes are combined with the
quantised transform coefficient information and encoded using either variable length codes or arithmetic coding.
0.6.1 Predictive coding
This subclause does not form an integral part of this Recommendation | International Standard.
Because of the conflicting requirements of random access and highly efficient compression, two main coding types are
specified. Intra coding is done without reference to other pictures. Intra coding may provide access points to the coded
sequence where decoding can begin and continue correctly, but typically also shows only moderate compression
efficiency. Inter coding (predictive or bi-predictive) is more efficient using inter prediction of each block of sample
values from some previously decoded picture selected by the encoder. In contrast to some other video coding standards,
pictures coded using bi-predictive inter prediction may also be used as references for inter coding of other pictures.
NEN-ISO/IEC 14496-10:2006-09
Dit document is door NEN onder licentie verstrekt aan: / This document has been supplied under license by NEN to:
Irdeto Access . 2006/10/30