高效视频编码标准：深入解析ITU-T H.265

4星 · 超过85%的资源需积分: 40 68 浏览量更新于2024-07-24 收藏 2.68MB PDF 举报

"H.265 规范是国际电信联盟（ITU-T）制定的一种高效视频编码标准，旨在提升高分辨率视频的压缩效率，降低带宽需求，同时保持高质量的视频体验。" H.265，也被称为High Efficiency Video Coding（HEVC），是继H.264/AVC之后的下一代视频编码技术。它在2013年由ITU-T发布，主要服务于视频传输、存储和播放等领域，尤其在4K和8K超高清视频时代，其优势更为显著。 H.265规范的核心目标是提高编码效率，这主要是通过以下方式实现的： 1. **块划分的灵活性**：H.265引入了更小的编码单元，最小可达到4x4像素，甚至更细粒度的划分，允许更精确的运动估计和补偿，从而减少冗余信息。 2. **多参考帧预测**：与H.264相比，H.265支持更多的参考帧，可以更有效地利用时间相关性，提高压缩效果。 3. **更高级的熵编码**：H.265采用了更先进的熵编码算法，如上下文自适应二进制算术编码（CABAC）的改进版本，能更高效地压缩编码数据。 4. **更精细的运动补偿**：使用更复杂的运动矢量预测，包括分块运动补偿，能够更准确地预测和编码像素块的运动。 5. **残留差分块分区**：采用更灵活的残留差分块结构，根据块的内容进行不同模式的编码。 6. **熵编码优化**：包括更精细的区间划分和适应性上下文模型，进一步提升了压缩效率。 7. **更强的去块效应滤波器**：为了减少高压缩率下可能出现的视觉质量下降，H.265增强了去块效应滤波器。 8. **预测结构的增强**：引入了新的预测模式，如深度预测、合并预测等，以适应复杂场景和多视角视频。 H.265规范的这些改进使得相同质量的视频，相比于H.264，所需的数据量大约减少了一半。这对于网络带宽有限的环境，如移动通信、在线流媒体服务和无线视频传输等，具有极大的价值。同时，它也推动了超高清视频内容的普及和发展。 H.265是现代视频编码技术的重要里程碑，它的出现极大地优化了视频内容的传输和处理，降低了存储和传输的成本，为高清视频的广泛应用提供了强大的技术支持。

2 Rec. ITU-T H.265 (04/2013)

This is the first version of this Specification. Additional versions are anticipated.

0.5 Profiles, tiers and levels

This Recommendation | International Standard is designed to be generic in the sense that it serves a wide range of

applications, bit rates, resolutions, qualities, and services. Applications should cover, among other things, digital storage

media, television broadcasting and real-time communications. In the course of creating this Specification, various

requirements from typical applications have been considered, necessary algorithmic elements have been developed, and

these have been integrated into a single syntax. Hence, this Specification will facilitate video data interchange among

different applications.

Considering the practicality of implementing the full syntax of this Specification, however, a limited number of subsets

of the syntax are also stipulated by means of "profiles", "tiers", and "levels". These and other related terms are formally

defined in clause 3.

A "profile" is a subset of the entire bitstream syntax that is specified in this Recommendation | International Standard.

Within the bounds imposed by the syntax of a given profile it is still possible to require a very large variation in the

performance of encoders and decoders depending upon the values taken by syntax elements in the bitstream such as the

specified size of the decoded pictures. In many applications, it is currently neither practical nor economic to implement

a decoder capable of dealing with all hypothetical uses of the syntax within a particular profile.

In order to deal with this problem, "tiers" and "levels" are specified within each profile. A level of a tier is a specified

set of constraints imposed on values of the syntax elements in the bitstream. These constraints may be simple limits on

values. Alternatively they may take the form of constraints on arithmetic combinations of values (e.g., picture width

multiplied by picture height multiplied by number of pictures decoded per second). A level specified for a lower tier is

more constrained than a level specified for a higher tier.

Coded video content conforming to this Recommendation | International Standard uses a common syntax. In order to

achieve a subset of the complete syntax, flags, parameters, and other syntax elements are included in the bitstream that

signal the presence or absence of syntactic elements that occur later in the bitstream.

0.6 Overview of the design characteristics

The coded representation specified in the syntax is designed to enable a high compression capability for a desired image

or video quality. The algorithm is typically not lossless, as the exact source sample values are typically not preserved

through the encoding and decoding processes. A number of techniques may be used to achieve highly efficient

compression. Encoding algorithms (not specified in this Recommendation | International Standard) may select between

inter and intra coding for block-shaped regions of each picture. Inter coding uses motion vectors for block-based inter

prediction to exploit temporal statistical dependencies between different pictures. Intra coding uses various spatial

prediction modes to exploit spatial statistical dependencies in the source signal for a single picture. Motion vectors and

intra prediction modes may be specified for a variety of block sizes in the picture. The prediction residual may then be

further compressed using a transform to remove spatial correlation inside the transform block before it is quantized,

producing a possibly irreversible process that typically discards less important visual information while forming a close

approximation to the source samples. Finally, the motion vectors or intra prediction modes may also be further

compressed using a variety of prediction mechanisms, and, after prediction, are combined with the quantized transform

coefficient information and encoded using arithmetic coding.

0.7 How to read this Specification

It is suggested that the reader starts with clause 1 (Scope) and moves on to clause 3 (Definitions). Clause 6 should be

read for the geometrical relationship of the source, input, and output of the decoder. Clause 7 (Syntax and semantics)

specifies the order to parse syntax elements from the bitstream. See clauses 7.1–7.3 for syntactical order and see

clause 7.4 for semantics; e.g., the scope, restrictions, and conditions that are imposed on the syntax elements. The actual

parsing for most syntax elements is specified in clause 9 (Parsing process). Clause 10 (Sub-bitstream extraction process)

specifies the sub-bitstream extraction process. Finally, clause 8 (Decoding process) specifies how the syntax elements

are mapped into decoded samples. Throughout reading this Specification, the reader should refer to clauses 2

(Normative references), 4 (Abbreviations), and 5 (Conventions) as needed. Annexes A through E also form an integral

part of this Recommendation | International Standard.

Annex A specifies profiles each being tailored to certain application domains, and defines the so-called tiers and levels

of the profiles. Annex B specifies syntax and semantics of a byte stream format for delivery of coded video as an

ordered stream of bytes. Annex C specifies the hypothetical reference decoder, bitstream conformance, decoder

conformance, and the use of the hypothetical reference decoder to check bitstream and decoder conformance. Annex D

specifies syntax and semantics for supplemental enhancement information message payloads. Annex E specifies syntax

and semantics of the video usability information parameters of the sequence parameter set.

Rec. ITU-T H.265 (04/2013) 3

Throughout this Specification, statements appearing with the preamble "NOTE –" are informative and are not an

integral part of this Recommendation | International Standard.

1 Scope

This Recommendation | International Standard specifies high efficiency video coding.

2 Normative references

2.1 General

The following Recommendations and International Standards contain provisions which, through reference in this text,

constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated

were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this

Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent

edition of the Recommendations and Standards listed below. Members of IEC and ISO maintain registers of currently

valid International Standards. The Telecommunication Standardization Bureau of the ITU maintains a list of currently

valid ITU-T Recommendations.

2.2 Identical Recommendations | International Standards

– None

2.3 Paired Recommendations | International Standards equivalent in technical content

– None

2.4 Additional references

– Recommendation ITU-T T.35 (in force), Procedure for the allocation of ITU-T defined codes for

non-standard facilities.

– ISO/IEC 11578: in force, Information technology — Open Systems Interconnection — Remote Procedure

Call (RPC).

– ISO 11664-1: in force, Colorimetry — Part 1: CIE standard colorimetric observers.

– ISO 12232: in force, Photography – Digital still cameras – Determination of exposure index, ISO speed

ratings, standard output sensitivity, and recommended exposure index.

– IETF RFC 1321 (in force), The MD5 Message-Digest Algorithm.

3 Definitions

For the purposes of this Recommendation | International Standard, the following definitions apply:

3.1 access unit: A set of NAL units that are associated with each other according to a specified classification rule,

are consecutive in decoding order, and contain exactly one coded picture.

NOTE – In addition to containing the VCL NAL units of the coded picture, an access unit may also contain non-

VCL NAL units. The decoding of an access unit always results in a decoded picture.

3.2 AC transform coefficient: Any transform coefficient for which the frequency index in at least one of the two

dimensions is non-zero.

3.3 associated non-VCL NAL unit: A non-VCL NAL unit (when present) for a VCL NAL unit where the VCL

NAL unit is the associated VCL NAL unit of the non-VCL NAL unit.

3.4 associated IRAP picture: The previous IRAP picture in decoding order (when present).

3.5 associated VCL NAL unit: The preceding VCL NAL unit in decoding order for a non-VCL NAL unit with

nal_unit_type equal to EOS_NUT, EOB_NUT, FD_NUT, or SUFFIX_SEI_NUT, or in the ranges of

RSV_NVCL45..RSV_NVCL47 or UNSPEC56..UNSPEC63; or otherwise the next VCL NAL unit in decoding

order.

3.6 bin: One bit of a bin string.

4 Rec. ITU-T H.265 (04/2013)

3.7 binarization: A set of bin strings for all possible values of a syntax element.

3.8 binarization process: A unique mapping process of all possible values of a syntax element onto a set of bin

strings.

3.9 bin string: An intermediate binary representation of values of syntax elements from the binarization of the

syntax element.

3.10 bi-predictive (B) slice: A slice that may be decoded using intra prediction or inter prediction using at most

two motion vectors and reference indices to predict the sample values of each block.

3.11 bitstream: A sequence of bits, in the form of a NAL unit stream or a byte stream, that forms the

representation of coded pictures and associated data forming one or more CVSs.

3.12 block: An MxN (M-column by N-row) array of samples, or an MxN array of transform coefficients.

3.13 broken link: A location in a bitstream at which it is indicated that some subsequent pictures in decoding

order may contain serious visual artefacts due to unspecified operations performed in the generation of the

bitstream.

3.14 broken link access (BLA) access unit: An access unit in which the coded picture is a BLA picture.

3.15 broken link access (BLA) picture: An IRAP picture for which each VCL NAL unit has nal_unit_type equal

to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.

NOTE – A BLA picture contains only I slices, and may be the first picture in the bitstream in decoding order, or

may appear later in the bitstream. Each BLA picture begins a new CVS, and has the same effect on the decoding

process as an IDR picture. However, a BLA picture contains syntax elements that specify a non-empty RPS. When a

BLA picture for which each VCL NAL unit has nal_unit_type equal to BLA_W_LP, it may have associated RASL

pictures, which are not output by the decoder and may not be decodable, as they may contain references to pictures

that are not present in the bitstream. When a BLA picture for which each VCL NAL unit has nal_unit_type equal to

BLA_W_LP, it may also have associated RADL pictures, which are specified to be decoded. When a BLA picture

for which each VCL NAL unit has nal_unit_type equal to BLA_W_RADL, it does not have associated RASL

pictures but may have associated RADL pictures. When a BLA picture for which each VCL NAL unit has

nal_unit_type equal to BLA_N_LP, it does not have any associated leading pictures.

3.16 buffering period: The set of access units starting with an access unit that contains a buffering period SEI

message and containing all subsequent access units in decoding order up to but not including the next access

unit (when present) that contains a buffering period SEI message.

3.17 byte: A sequence of 8 bits, within which, when written or read as a sequence of bit values, the left-most and

right-most bits represent the most and least significant bits, respectively.

3.18 byte-aligned: A position in a bitstream is byte-aligned when the position is an integer multiple of 8 bits from

the position of the first bit in the bitstream, and a bit or byte or syntax element is said to be byte-aligned when

the position at which it appears in a bitstream is byte-aligned.

3.19 byte stream: An encapsulation of a NAL unit stream containing start code prefixes and NAL units as specified

in Annex B.

3.20 can: A term used to refer to behaviour that is allowed, but not necessarily required.

3.21 chroma: An adjective, represented by the symbols Cb and Cr, specifying that a sample array or single sample

is representing one of the two colour difference signals related to the primary colours.

NOTE – The term chroma is used rather than the term chrominance in order to avoid the implication of the use of

linear light transfer characteristics that is often associated with the term chrominance.

3.22 clean random access (CRA) access unit: An access unit in which the coded picture is a CRA picture.

3.23 clean random access (CRA) picture: An IRAP picture for which each VCL NAL unit has nal_unit_type

equal to CRA_NUT.

NOTE – A CRA picture contains only I slices, and may be the first picture in the bitstream in decoding order, or

may appear later in the bitstream. A CRA picture may have associated RADL or RASL pictures. When a CRA

picture has NoRaslOutputFlag equal to 1, the associated RASL pictures are not output by the decoder, because they

may not be decodable, as they may contain references to pictures that are not present in the bitstream.

3.24 coded picture: A coded representation of a picture containing all coding tree units of the picture.

3.25 coded picture buffer (CPB): A first-in first-out buffer containing decoding units in decoding order specified

in the hypothetical reference decoder in Annex C.

3.26 coded representation: A data element as represented in its coded form.

Rec. ITU-T H.265 (04/2013) 5

3.27 coded slice segment NAL unit: A NAL unit that has nal_unit_type in the range of TRAIL_N to RASL_R,

inclusive, or in the range of BLA_W_LP to RSV_IRAP_VCL23, inclusive, which indicates that the NAL unit

contains a coded slice segment.

3.28 coded video sequence (CVS): A sequence of access units that consists, in decoding order, of an IRAP access

unit with NoRaslOutputFlag equal to 1, followed by zero or more access units that are not IRAP access units

with NoRaslOutputFlag equal to 1, including all subsequent access units up to but not including any

subsequent access unit that is an IRAP access unit with NoRaslOutputFlag equal to 1.

NOTE – An IRAP access unit may be an IDR access unit, a BLA access unit, or a CRA access unit. The value of

NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA access unit, and each CRA access unit that is

the first access unit in the bitstream in decoding order, is the first access unit that follows an end of sequence NAL

unit in decoding order, or has HandleCraAsBlaFlag equal to 1.

3.29 coding block: An NxN block of samples for some value of N such that the division of a coding tree block into

coding blocks is a partitioning.

3.30 coding tree block: An NxN block of samples for some value of N such that the division of a component into

coding tree blocks is a partitioning.

3.31 coding tree unit: A coding tree block of luma samples, two corresponding coding tree blocks of chroma

samples of a picture that has three sample arrays, or a coding tree block of samples of a monochrome picture

or a picture that is coded using three separate colour planes and syntax structures used to code the samples.

3.32 coding unit: A coding block of luma samples, two corresponding coding blocks of chroma samples of a

picture that has three sample arrays, or a coding block of samples of a monochrome picture or a picture that is

coded using three separate colour planes and syntax structures used to code the samples.

3.33 component: An array or single sample from one of the three arrays (luma and two chroma) that compose a

picture in 4:2:0, 4:2:2, or 4:4:4 colour format or the array or a single sample of the array that compose a

picture in monochrome format.

3.34 context variable: A variable specified for the adaptive binary arithmetic decoding process of a bin by an

equation containing recently decoded bins.

3.35

cropped decoded picture: The result of cropping a decoded picture based on the conformance cropping

window specified in the SPS that is referred to by the corresponding coded picture.

3.36 decoded picture: A decoded picture is derived by decoding a coded picture.

3.37 decoded picture buffer (DPB): A buffer holding decoded pictures for reference, output reordering, or output

delay specified for the hypothetical reference decoder in Annex C.

3.38 decoder: An embodiment of a decoding process.

3.39 decoder under test (DUT): A decoder that is tested for conformance to this Specification by operating the

hypothetical stream scheduler to deliver a conforming bitstream to the decoder and to the hypothetical

reference decoder and comparing the values and timing or order of the output of the two decoders.

3.40 decoding order: The order in which syntax elements are processed by the decoding process.

3.41 decoding process: The process specified in this Specification that reads a bitstream and derives decoded

pictures from it.

3.42 decoding unit: An access unit if SubPicHrdFlag is equal to 0 or a subset of an access unit otherwise,

consisting of one or more VCL NAL units in an access unit and the associated non-VCL NAL units.

3.43 dependent slice segment: A slice segment for which the values of some syntax elements of the slice segment

header are inferred from the values for the preceding independent slice segment in decoding order.

3.44 display process: A process not specified in this Specification having, as its input, the

cropped decoded

pictures that are the output of the decoding process.

3.45 elementary stream: A sequence of one or more bitstreams.

NOTE – An elementary stream that consists of two or more bitstreams would typically have been formed by splicing

together two or more bitstreams (or parts thereof).

3.46 emulation prevention byte: A byte equal to 0x03 that is present within a NAL unit when the syntax elements

of the bitstream form certain patterns of byte values in a manner that ensures that no sequence of consecutive

byte-aligned bytes in the NAL unit can contain a start code prefix.

3.47 encoder: An embodiment of an encoding process.

6 Rec. ITU-T H.265 (04/2013)

3.48 encoding process: A process not specified in this Specification that produces a bitstream conforming to this

Specification.

3.49 field: An assembly of alternative rows of samples of a frame.

3.50 filler data NAL units: NAL units with nal_unit_type equal to FD_NUT.

3.51 flag: A variable that can take one of the two possible values 0 and 1.

3.52 frame: The composition of a top field and a bottom field, where sample rows 0, 2, 4, ... originate from the top

field and sample rows 1, 3, 5, ... originate from the bottom field.

3.53 frequency index: A one-dimensional or two-dimensional index associated with a transform coefficient prior

to an inverse transform part of the decoding process.

3.54 hypothetical reference decoder (HRD): A hypothetical decoder model that specifies constraints on the

variability of conforming NAL unit streams or conforming byte streams that an encoding process may

produce.

3.55 hypothetical stream scheduler (HSS): A hypothetical delivery mechanism used for checking the

conformance of a bitstream or a decoder with regards to the timing and data flow of the input of a bitstream

into the hypothetical reference decoder.

3.56 independent slice segment: A slice segment for which the values of the syntax elements of the slice segment

header are not inferred from the values for a preceding slice segment.

3.57 informative: A term used to refer to content provided in this Specification that does not establish any

mandatory requirements for conformance to this Specification and thus is not considered an integral part of

this Specification.

3.58 instantaneous decoding refresh (IDR) access unit: An access unit in which the coded picture is an

IDR

picture.

3.59 instantaneous decoding refresh (IDR) picture: An IRAP picture for which each VCL NAL unit has

nal_unit_type equal to IDR_W_RADL or IDR_N_LP.

NOTE – An IDR picture contains only I slices, and may be the first picture in the bitstream in decoding order, or

may appear later in the bitstream. Each IDR picture is the first picture of a CVS in decoding order. When an IDR

picture for which each VCL NAL unit has nal_unit_type equal to IDR_W_RADL, it may have associated RADL

pictures. When an IDR picture for which each VCL NAL unit has nal_unit_type equal to IDR_N_LP, it does not

have any associated leading pictures. An IDR picture does not have associated RASL pictures.

3.60 inter coding: Coding of a coding block, slice, or picture that uses inter prediction.

3.61 inter prediction: A prediction derived in a manner that is dependent on data elements (e.g., sample values or

motion vectors) of pictures other than the current picture.

3.62 intra coding: Coding of a coding block, slice, or picture that uses intra prediction.

3.63 intra prediction: A prediction derived from only data elements (e.g., sample values) of the same decoded

slice.

3.64 intra random access point (IRAP) access unit: An access unit in which the coded picture is an IRAP

picture.

3.65 intra random access point (IRAP) picture: A coded picture for which each VCL NAL unit has nal_unit_type

in the range of BLA_W_LP to RSV_IRAP_VCL23, inclusive.

NOTE – An IRAP picture contains only I slices, and may be a BLA picture, a CRA picture or an IDR picture. The

first picture in the bitstream in decoding order must be an IRAP picture. Provided the necessary parameter sets are

available when they need to be activated, the IRAP picture and all subsequent non-RASL pictures in decoding order

can be correctly decoded without performing the decoding process of any pictures that precede the IRAP picture in

decoding order. There may be pictures in a bitstream that contain only I slices that are not IRAP pictures.

3.66 intra (I) slice: A slice that is decoded using intra prediction only.

3.67 inverse transform: A part of the decoding process by which a set of transform coefficients are converted into

spatial-domain values.

3.68 layer: A set of VCL NAL units that all have a particular value of nuh_layer_id and the associated non-VCL

NAL units, or one of a set of syntactical structures having a hierarchical relationship.

NOTE – Depending on the context, either the first layer concept or the second layer concept applies. The first layer

concept is also referred to as a scalable layer, wherein a layer may be a spatial scalable layer, a quality scalable

layer, a view, etc. A temporal true subset of a scalable layer is not referred to as a layer but referred to as a sub-layer

剩余316页未读，继续阅读

caothreesunscao

粉丝: 0
资源: 11

高效视频编码标准：深入解析ITU-T H.265

最新H.265标准草案（High efficiency video coding (HEVC) text specification）

ITU-T H.265.2：2016 Reference software for ITU-T H.265 high efficiency video coding - 完整英文电子版（12页）.pdf

ITU-T H.265.1：2018 Conformance specification for ITU-T H.265 high efficiency video coding - 完整英文电子版（116页）.pdf

H.264、H.265、smartH.265的区别

H.264、H.265、H.266

h265 vps sps pps

H.265 H.264基础知识

H.265/HEVC

h.265码流分析工具

h.265/hevc:视频编码新标准及其扩展 pdf

最新资源