ITU-T H.264 视频编码标准

4星 · 超过85%的资源需积分: 12 162 浏览量更新于2024-07-25 收藏 2.03MB PDF 举报

"H.264标准是国际电信联盟(ITU)下属的电信标准化部门(ITU-T)制定的一项视频编码标准。该标准旨在全球范围内统一电信技术，并通过世界电信标准化大会(WTSA)每四年一次的会议确定研究主题。ITU-T的研究小组随后就这些主题制定建议。H.264标准属于ITU-T与ISO和IEC合作制定的信息技术标准的一部分，专注于音频视觉和多媒体服务的基础设施——移动视频编码，提供先进的视频编码方案。" H.264标准，全称是ITU-T的H.264/AVC（Advanced Video Coding for Generic Audiovisual Services），是视频压缩领域的一个重要里程碑。它是由ITU-T的VCEG（视频编码专家小组）和ISO/IEC的MPEG（运动图像专家组）共同开发的，目标是在保持视频质量的同时，大幅度提高压缩效率，减少视频数据的存储和传输需求。 H.264标准采用了多种创新技术来优化视频编码，包括： 1. 分块编码：视频图像被分割成多个宏块，每个宏块可以独立编码，允许更灵活的压缩策略。 2. 预测编码：利用相邻帧的信息预测当前帧的像素值，减少需要传输的数据量。 3. 运动补偿：通过查找最佳匹配块来估计运动信息，提高预测的准确性。 4. 变换与量化：将像素差值转换到频域进行编码，然后进行量化，去除人眼不易察觉的细节。 5. 去块效应滤波器：减少由于块划分造成的视觉干扰。 6.熵编码：如 CABAC（上下文自适应二进制算术编码）或 CAVLC（上下文自适应变长编码），进一步优化编码效率。 H.264标准的应用广泛，包括高清电视、DVD、在线流媒体、视频会议、移动设备等。它的高效编码能力使得在有限带宽下实现高清视频传输成为可能，对现代数字媒体产业产生了深远影响。随着技术的发展，H.264之后又出现了更新的标准，如H.265/HEVC（高效视频编码），以应对更高分辨率如4K、8K视频的需求。 H.264标准是通信和多媒体领域的一个关键标准，通过其高效的编码技术，推动了视频内容的普及和传输质量的提升。这一标准的制定过程和实施，体现了国际组织在推动全球电信标准化方面的重要作用。

ITU-T Rec. H.264 (03/2005) – Prepublished version 14

0 Introduction

This clause does not form an integral part of this Recommendation | International Standard.

0.1 Prologue

This subclause does not form an integral part of this Recommendation | International Standard.

As the costs for both processing power and memory have reduced, network support for coded video data has diversified,

and advances in video coding technology have progressed, the need has arisen for an industry standard for compressed

video representation with substantially increased coding efficiency and enhanced robustness to network environments.

Toward these ends the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group

(MPEG) formed a Joint Video Team (JVT) in 2001 for development of a new Recommendation | International Standard.

0.2 Purpose

This subclause does not form an integral part of this Recommendation | International Standard.

This Recommendation | International Standard was developed in response to the growing need for higher compression of

moving pictures for various applications such as videoconferencing, digital storage media, television broadcasting,

internet streaming, and communication. It is also designed to enable the use of the coded video representation in a

flexible manner for a wide variety of network environments. The use of this Recommendation | International Standard

allows motion video to be manipulated as a form of computer data and to be stored on various storage media, transmitted

and received over existing and future networks and distributed on existing and future broadcasting channels.

0.3 Applications

This subclause does not form an integral part of this Recommendation | International Standard.

This Recommendation | International Standard is designed to cover a broad range of applications for video content

including but not limited to the following:

CATV Cable TV on optical networks, copper, etc.

DBS Direct broadcast satellite video services

DSL Digital subscriber line video services

DTTB Digital terrestrial television broadcasting

ISM Interactive storage media (optical disks, etc.)

MMM Multimedia mailing

MSPN Multimedia services over packet networks

RTC Real-time conversational services (videoconferencing, videophone, etc.)

RVS Remote video surveillance

SSM Serial storage media (digital VTR, etc.)

0.4 Publication and versions of this specification

This subclause does not form an integral part of this Recommendation | International Standard.

This specification has been jointly developed by ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving

Picture Experts Group. It is published as technically-aligned twin text in both organizations ITU-T and ISO/IEC.

ITU-T Rec. H.264 | ISO/IEC 14496-10 version 1 refers to the first (2003) approved version of this Recommendation |

International Standard.

ITU-T Rec. H.264 | ISO/IEC 14496-10 version 2 refers to the integrated text containing the corrections specified in the

first technical corrigendum.

ITU-T Rec. H.264 | ISO/IEC 14496-10 version 3 refers to the integrated text containing both the first technical

corrigendum (2004) and the first amendment, which is referred to as the "Fidelity range extensions".

ITU-T Rec. H.264 | ISO/IEC 14496-10 version 4 (the current specification) refers to the integrated text containing the

first technical corrigendum (2004), the first amendment (the "Fidelity range extensions"), and an additional technical

corrigendum (2005). In the ITU-T, the next published version after version 2 was version 4 (due to the completion of the

drafting work for version 4 prior to the approval opportunity for a final version 3 text).

ITU-T Rec. H.264 (03/2005) – Prepublished version 15

0.5 Profiles and levels

This subclause does not form an integral part of this Recommendation | International Standard.

This Recommendation | International Standard is designed to be generic in the sense that it serves a wide range of

applications, bit rates, resolutions, qualities, and services. Applications should cover, among other things, digital storage

media, television broadcasting and real-time communications. In the course of creating this Specification, various

requirements from typical applications have been considered, necessary algorithmic elements have been developed, and

these have been integrated into a single syntax. Hence, this Specification will facilitate video data interchange among

different applications.

Considering the practicality of implementing the full syntax of this Specification, however, a limited number of subsets

of the syntax are also stipulated by means of "profiles" and "levels". These and other related terms are formally defined

in clause

A "profile" is a subset of the entire bitstream syntax that is specified by this Recommendation | International Standard.

Within the bounds imposed by the syntax of a given profile it is still possible to require a very large variation in the

performance of encoders and decoders depending upon the values taken by syntax elements in the bitstream such as the

specified size of the decoded pictures. In many applications, it is currently neither practical nor economic to implement a

decoder capable of dealing with all hypothetical uses of the syntax within a particular profile.

In order to deal with this problem, "levels" are specified within each profile. A level is a specified set of constraints

imposed on values of the syntax elements in the bitstream. These constraints may be simple limits on values.

Alternatively they may take the form of constraints on arithmetic combinations of values (e.g. picture width multiplied

by picture height multiplied by number of pictures decoded per second).

Coded video content conforming to this Recommendation | International Standard uses a common syntax. In order to

achieve a subset of the complete syntax, flags, parameters, and other syntax elements are included in the bitstream that

signal the presence or absence of syntactic elements that occur later in the bitstream.

0.6 Overview of the design characteristics

This subclause does not form an integral part of this Recommendation | International Standard.

The coded representation specified in the syntax is designed to enable a high compression capability for a desired image

quality. With the exception of the transform bypass mode of operation for lossless coding in the High 4:4:4 profile and

the I_PCM mode of operation in all profiles, the algorithm is typically not lossless, as the exact source sample values are

typically not preserved through the encoding and decoding processes. A number of techniques may be used to achieve

highly efficient compression. Encoding algorithms (not specified in this Recommendation | International Standard) may

select between inter and intra coding for block-shaped regions of each picture. Inter coding uses motion vectors for

block-based inter prediction to exploit temporal statistical dependencies between different pictures. Intra coding uses

various spatial prediction modes to exploit spatial statistical dependencies in the source signal for a single picture.

Motion vectors and intra prediction modes may be specified for a variety of block sizes in the picture. The prediction

residual is then further compressed using a transform to remove spatial correlation inside the transform block before it is

quantised, producing an irreversible process that typically discards less important visual information while forming a

close approximation to the source samples. Finally, the motion vectors or intra prediction modes are combined with the

quantised transform coefficient information and encoded using either variable length codes or arithmetic coding.

0.6.1 Predictive coding

This subclause does not form an integral part of this Recommendation | International Standard.

Because of the conflicting requirements of random access and highly efficient compression, two main coding types are

specified. Intra coding is done without reference to other pictures. Intra coding may provide access points to the coded

sequence where decoding can begin and continue correctly, but typically also shows only moderate compression

efficiency. Inter coding (predictive or bi-predictive) is more efficient using inter prediction of each block of sample

values from some previously decoded picture selected by the encoder. In contrast to some other video coding standards,

pictures coded using bi-predictive inter prediction may also be used as references for inter coding of other pictures.

The application of the three coding types to pictures in a sequence is flexible, and the order of the decoding process is

generally not the same as the order of the source picture capture process in the encoder or the output order from the

decoder for display. The choice is left to the encoder and will depend on the requirements of the application. The

decoding order is specified such that the decoding of pictures that use inter-picture prediction follows later in decoding

order than other pictures that are referenced in the decoding process.

0.6.2 Coding of progressive and interlaced video

This subclause does not form an integral part of this Recommendation | International Standard.

ITU-T Rec. H.264 (03/2005) – Prepublished version 16

This Recommendation | International Standard specifies a syntax and decoding process for video that originated in either

progressive-scan or interlaced-scan form, which may be mixed together in the same sequence. The two fields of an

interlaced frame are separated in capture time while the two fields of a progressive frame share the same capture time.

Each field may be coded separately or the two fields may be coded together as a frame. Progressive frames are typically

coded as a frame. For interlaced video, the encoder can choose between frame coding and field coding. Frame coding or

field coding can be adaptively selected on a picture-by-picture basis and also on a more localized basis within a coded

frame. Frame coding is typically preferred when the video scene contains significant detail with limited motion. Field

coding typically works better when there is fast picture-to-picture motion.

0.6.3 Picture partitioning into macroblocks and smaller partitions

This subclause does not form an integral part of this Recommendation | International Standard.

As in previous video coding Recommendations and International Standards, a macroblock, consisting of a 16x16 block

of luma samples and two corresponding blocks of chroma samples, is used as the basic processing unit of the video

decoding process.

A macroblock can be further partitioned for inter prediction. The selection of the size of inter prediction partitions is a

result of a trade-off between the coding gain provided by using motion compensation with smaller blocks and the

quantity of data needed to represent the data for motion compensation. In this Recommendation | International Standard

the inter prediction process can form segmentations for motion representation as small as 4x4 luma samples in size, using

motion vector accuracy of one-quarter of the luma sample grid spacing displacement. The process for inter prediction of

a sample block can also involve the selection of the picture to be used as the reference picture from a number of stored

previously-decoded pictures. Motion vectors are encoded differentially with respect to predicted values formed from

nearby encoded motion vectors.

Typically, the encoder calculates appropriate motion vectors and other data elements represented in the video data

stream. This motion estimation process in the encoder and the selection of whether to use inter prediction for the

representation of each region of the video content is not specified in this Recommendation | International Standard.

0.6.4 Spatial redundancy reduction

This subclause does not form an integral part of this Recommendation | International Standard.

Both source pictures and prediction residuals have high spatial redundancy. This

Recommendation | International Standard is based on the use of a block-based transform method for spatial redundancy

removal. After inter prediction from previously-decoded samples in other pictures or spatial-based prediction from

previously-decoded samples within the current picture, the resulting prediction residual is split into 4x4 blocks. These

are converted into the transform domain where they are quantised. After quantisation many of the transform coefficients

are zero or have low amplitude and can thus be represented with a small amount of encoded data. The processes of

transformation and quantisation in the encoder are not specified in this Recommendation | International Standard.

0.7 How to read this specification

This subclause does not form an integral part of this Recommendation | International Standard.

It is suggested that the reader starts with clause

1 (Scope) and moves on to clause 3 (Definitions). Clause 6 should be

read for the geometrical relationship of the source, input, and output of the decoder. Clause

7 (Syntax and semantics)

specifies the order to parse syntax elements from the bitstream. See subclauses

7.1-7.3 for syntactical order and see

subclause

7.4 for semantics; i.e., the scope, restrictions, and conditions that are imposed on the syntax elements. The

actual parsing for most syntax elements is specified in clause

9 (Parsing process). Finally, clause 8 (Decoding process)

specifies how the syntax elements are mapped into decoded samples. Throughout reading this specification, the reader

should refer to clauses

2 (Normative references), 4 (Abbreviations), and 5 (Conventions) as needed. Annexes A through

E also form an integral part of this Recommendation | International Standard.

Annex A specifies seven profiles (Baseline, Main, Extended, High, High 10, High 4:2:2 and High 4:4:4), each being

tailored to certain application domains, and defines the so-called levels of the profiles. Annex B specifies syntax and

semantics of a byte stream format for delivery of coded video as an ordered stream of bytes. Annex C specifies the

hypothetical reference decoder and its use to check bitstream and decoder conformance. Annex D specifies syntax and

semantics for supplemental enhancement information message payloads. Finally, Annex E specifies syntax and

semantics of the video usability information parameters of the sequence parameter set.

Throughout this specification, statements appearing with the preamble "NOTE -" are informative and are not an integral

part of this Recommendation | International Standard.

ITU-T Rec. H.264 (03/2005) – Prepublished version 17

1 Scope

This document specifies ITU-T Recommendation H.264 | ISO/IEC International Standard ISO/IEC 14496-10 video

coding.

2 Normative references

The following Recommendations and International Standards contain provisions that, through reference in this text,

constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated

were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this

Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent

edition of the Recommendations and Standards listed below. Members of IEC and ISO maintain registers of currently

valid International Standards. The Telecommunication Standardisation Bureau of the ITU maintains a list of currently

valid ITU-T Recommendations.

– ITU-T Recommendation T.35 (2000), Procedure for the allocation of ITU-T defined codes for non-

standard facilities

– ISO/IEC 11578:1996, Annex A, Universal Unique Identifier

– ISO/CIE 10527:1991, Colorimetric Observers

3 Definitions

For the purposes of this Recommendation | International Standard, the following definitions apply.

3.1 access unit: A set of NAL units always containing exactly one primary coded picture. In addition to the

primary coded picture, an access unit may also contain one or more redundant coded pictures or other NAL

units not containing slices or slice data partitions of a coded picture. The decoding of an access unit always

results in a decoded picture.

3.2 AC transform coefficient: Any transform coefficient for which the frequency index in one or both dimensions

is non-zero.

3.3 adaptive binary arithmetic decoding process: An entropy decoding process that derives the values of bins

from a bitstream produced by an adaptive binary arithmetic encoding process.

3.4 adaptive binary arithmetic encoding process: An entropy encoding process, not normatively specified in this

Recommendation | International Standard, that codes a sequence of bins and produces a bitstream that can be

decoded using the adaptive binary arithmetic decoding process.

3.5 alpha blending: A process not specified by this Recommendation | International Standard, in which an

auxiliary coded picture is used in combination with a primary coded picture and with other data not specified

by this Recommendation | International Standard in the display process. In an alpha blending process, the

samples of an auxiliary coded picture are interpreted as indications of the degree of opacity (or, equivalently,

the degrees of transparency) associated with the corresponding luma samples of the primary coded picture.

3.6 arbitrary slice order: A decoding order of slices in which the macroblock address of the first macroblock of

some slice of a picture may be less than the

macroblock address of the first macroblock of some other

preceding slice of the same coded picture.

3.7 auxiliary coded picture: A picture that supplements the primary coded picture that may be used in

combination with other data not specified by this Recommendation | International Standard in the display

process. An auxiliary coded picture has the same syntactic and semantic restrictions as a monochrome

redundant coded picture. An auxiliary coded picture must contain the same number of macroblocks as the

primary coded picture. Auxiliary coded pictures have no normative effect on the decoding process. See also

primary coded picture and redundant coded picture.

3.8 B slice: A slice that may be decoded using intra prediction from decoded samples within the same slice or

inter prediction from previously-decoded reference pictures, using at most two motion vectors and reference

indices to predict the sample values of each block.

3.9 bin: One bit of a bin string.

3.10 binarization: A set of bin strings for all possible values of a syntax element.

3.11 binarization process: A unique mapping process of all possible values of a syntax element onto a set of bin

strings.

ITU-T Rec. H.264 (03/2005) – Prepublished version 18

3.12 bin string: A string of bins. A bin string is an intermediate binary representation of values of syntax elements

from the binarization of the syntax element.

3.13 bi-predictive slice: See B slice.

3.14 bitstream: A sequence of bits that forms the representation of coded pictures and associated data forming one

or more coded video sequences. Bitstream is a collective term used to refer either to a NAL unit stream or a

byte stream.

3.15 block: An MxN (M-column by N-row) array of samples, or an MxN array of transform coefficients.

3.16 bottom field: One of two fields that comprise a frame. Each row of a bottom field is spatially located

immediately below a corresponding row of a top field.

3.17 bottom macroblock (of a macroblock pair): The macroblock within a macroblock pair that contains the

samples in the bottom row of samples for the macroblock pair. For a field macroblock pair, the bottom

macroblock represents the samples from the region of the bottom field of the frame that lie within the spatial

region of the macroblock pair. For a frame macroblock pair, the bottom macroblock represents the samples of

the frame that lie within the bottom half of the spatial region of the macroblock pair.

3.18 broken link: A location in a bitstream at which it is indicated that some subsequent pictures in decoding order

may contain serious visual artefacts due to unspecified operations performed in the generation of the bitstream.

3.19 byte: A sequence of 8 bits, written and read with the most significant bit on the left and the least significant bit

on the right. When represented in a sequence of data bits, the most significant bit of a byte is first.

3.20 byte-aligned: A position in a bitstream is byte-aligned when the position is an integer multiple of 8 bits from

the position of the first bit in the bitstream. A bit or byte or syntax element is said to be byte-aligned when the

position at which it appears in a bitstream is byte-aligned.

3.21 byte stream: An encapsulation of a NAL unit stream containing start code prefixes and NAL units as specified

in Annex B.

3.22 can: A term used to refer to behaviour that is allowed, but not necessarily required.

3.23 category: A number associated with each syntax element. The category is used to specify the allocation of

syntax elements to NAL units for slice data partitioning. It may also be used in a manner determined by the

application to refer to classes of syntax elements in a manner not specified in this

Recommendation | International Standard.

3.24 chroma: An adjective specifying that a sample array or single sample is representing one of the two colour

difference signals related to the primary colours. The symbols used for a chroma array or sample are Cb and

Cr.

NOTE - The term chroma is used rather than the term chrominance in order to avoid the implication of the use of linear

light transfer characteristics that is often associated with the term chrominance.

3.25 coded field: A coded representation of a field.

3.26 coded frame: A coded representation of a frame.

3.27 coded picture: A coded representation of a picture. A coded picture may be either a coded field or a coded

frame. Coded picture is a collective term referring to a primary coded picture or a redundant coded picture, but

not to both together.

3.28 coded picture buffer (CPB): A first-in first-out buffer containing access units in decoding order specified in

the hypothetical reference decoder in Annex C.

3.29 coded representation: A data element as represented in its coded form.

3.30 coded video sequence: A sequence of access units that consists, in decoding order, of an IDR access unit

followed by zero or more non-IDR access units including all subsequent access units up to but not including

any subsequent IDR access unit.

3.31 component: An array or single sample from one of the three arrays (luma and two chroma) that make up a

field or frame.

3.32 complementary field pair: A collective term for a complementary reference field pair or a complementary

non-reference field pair.

3.33 complementary non-reference field pair: Two non-reference fields that are in consecutive access units in

decoding order as two coded fields of opposite parity where the first

field is not already a paired field.

剩余331页未读，继续阅读

vaqeteart

粉丝: 33
资源: 126

ITU-T H.264 视频编码标准

Enlish test.md

ENLISH LETTER WRITTING

GH-Bladed software 英国培训材料（Day 1）

vit-keras-0.0.11.tar.gz

5212-微信小程序疫苗预约系统+ssm（源码+数据库+lun文）.zip

基于 Flask 的书评系统.zip

5205-微信小程序的二手物品交易平台ssm（源码+数据库+lun文）.zip

电影推荐系统综合应用.zip

5217-微信小程序英语互助小程序springboot（源码+数据库+lun文）.zip

5159-微信小程序博客小程序+ssm（源码+数据库+lun文）.zip

最新资源