没有合适的资源?快使用搜索试试~ 我知道了~
首页ITU-T H.266: Versatile Video Coding - 新一代视频编码标准
收藏
"H.266是国际电信联盟(ITU-T)发布的一项最新的视频编码标准,全称为Versatile Video Coding,旨在提高视频压缩效率,降低带宽需求,同时保持视频质量。此标准在2020年8月发布,属于ITU-T H系列中的H.266推荐标准,涵盖了音频、视觉和多媒体服务的基础架构、传输、同步、系统方面、通信程序等多个领域。"
H.266,也被称为VVC(Versatile Video Coding),是继H.264/AVC和H.265/HEVC之后的下一代视频编码标准。它主要针对4K、8K超高清视频以及虚拟现实(VR)和增强现实(AR)等高分辨率、高动态范围(HDR)内容的高效编码。随着这些技术的发展,对更高效的数据压缩需求变得日益迫切。
H.266标准引入了多种创新的编码技术,以实现更高的压缩比率。其中包括:
1. **更精细的块划分**:H.266允许使用更小的编码单元,如16x16、32x32甚至4x4像素的块,这使得编码器能够更精确地处理视频中的复杂区域。
2. **更复杂的预测模式**:增加了更多种类的帧内和帧间预测模式,包括角度预测和多参考帧预测,以减少冗余信息。
3. **改进的运动补偿**:采用了更高级的运动估计和补偿方法,如基于机器学习的预测模型,以更准确地估计像素运动。
4. **更高效的熵编码**:更新了熵编码算法,如改进的上下文自适应二进制算术编码(CABAC)和新的熵编码模式,以进一步压缩编码后的数据。
5. **更智能的变换和量化**:采用了新的变换结构和量化技术,如可变大小的离散余弦变换(DCT)和自适应量化,优化了不同场景下的编码效率。
6. **深度图像编码优化**:对于3D和VR内容,H.266支持更有效的深度图像编码,减少了存储和传输深度信息所需的带宽。
7. **增强的自适应比特率控制**:根据视频内容的复杂性和网络条件,动态调整编码参数,确保视频质量和带宽使用的平衡。
H.266标准的实施将有助于降低视频流媒体、在线教育、远程医疗、云游戏等应用场景的带宽需求,提高服务质量和用户体验。然而,与更高的压缩效率相伴的是编码和解码过程中的计算复杂度增加,这需要更强大的硬件支持,特别是在实时视频处理中。
H.266/VVC作为视频编码领域的最新里程碑,不仅推动了视频技术的进步,也为未来的多媒体应用提供了坚实的技术基础。随着5G网络的普及,H.266将更好地满足用户对高质量、低延迟视频体验的需求。
2 Rec. ITU-T H.266 (08/2020)
in Annex C. These VUI parameters and SEI messages may be used together with this Recommendation | International
Standard.
Versions of this Recommendation | International Standard
This is the first version of this Recommendation | International Standard.
Overview of the design characteristics
The coded representation specified in the syntax is designed to enable a high compression capability for a desired image
or video quality. The algorithm is typically not mathematically lossless, as the exact source sample values are typically not
preserved through the encoding and decoding processes, although some modes are included that provide lossless coding
capability. A number of techniques are specified to enable highly efficient compression. Encoding algorithms (not
specified within the scope of this Recommendation | International Standard) may select between inter, intra, intra block
copy (IBC), and palette coding for block-shaped regions of each picture. Inter coding uses motion vectors for block-based
inter-picture prediction to exploit temporal statistical dependencies between different pictures, intra coding uses various
spatial prediction modes to exploit spatial statistical dependencies in the source signal within the same picture, and intra
block copy coding uses block displacement vectors to reference previously decoded regions of the same picture to exploit
statistical similarities among different areas of the same picture. Motion vectors, intra prediction modes, and IBC block
vectors are specified for a variety of block sizes in the picture. The prediction residual can then be further compressed
using a spatial transform to remove spatial correlation inside a block before it is quantized, producing a possibly irreversible
process that typically discards less important visual information while forming a close approximation to the source
samples. Finally, the motion vectors, intra prediction modes, and block vectors can also be further compressed using a
variety of prediction mechanisms, and, after prediction, are combined with the quantized transform coefficient information
and encoded using arithmetic coding.
How to read this document
It is suggested that the reader starts with clause 1 and moves on to clause 3. Clause 6 should be read for the geometrical
relationship of the source, input, and output of the decoder. Clause 7 specifies the order to parse syntax elements from the
bitstream. See clauses 7.1 to 7.3 for syntactical order and clause 7.4 for semantics; e.g., the scope, restrictions, and
conditions that are imposed on the syntax elements. The actual parsing for most syntax elements is specified in clause 9.
Finally, clause 8 specifies how the syntax elements are mapped into decoded samples. Throughout reading this document,
the reader should refer to clauses 2, 4, and 5 as needed. Annexes A through D also form an integral part of this
Recommendation | International Standard.
Annex A specifies profiles, each being tailored to certain application domains, and defines the so-called tiers and levels of
the profiles. Annex B specifies syntax and semantics of a byte stream format for delivery of coded video as an ordered
stream of bytes. Annex C specifies the hypothetical reference decoder, bitstream conformance, decoder conformance, and
the use of the hypothetical reference decoder to check bitstream and decoder conformance. Annex D specifies syntax and
semantics for supplemental enhancement information (SEI) message payloads that affect the conformance specifications
in Annex C. Rec. ITU-T H.274 | ISO/IEC 23002-7 specifies the syntax and semantics of the video usability information
(VUI) parameters as well as SEI messages that do not affect the conformance specifications in Annex C. These VUI
parameters and SEI messages may be used together with this Recommendation | International Standard.
1 Scope
This Recommendation | International Standard specifies a video coding technology known as Versatile Video Coding
(VVC), comprising a video coding technology with a compression capability that is substantially beyond that of the prior
generations of such standards and with sufficient versatility for effective use in a broad range of applications.
Only the syntax format, semantics, and associated decoding process requirements are specified, while other matters such
as pre-processing, the encoding process, system signalling and multiplexing, data loss recovery, post-processing, and video
display are considered to be outside the scope of this Recommendation | International Standard. Additionally, the internal
processing steps performed within a decoder are also considered to be outside the scope of this Recommendation |
International Standard; only the externally observable output behaviour is required to conform to the specifications of this
Recommendation | International Standard.
This Recommendation | International Standard is designed to be generic in the sense that it serves a wide range of
applications, bit rates, resolutions, qualities and services. Applications include, but are not limited to, video coding for
digital storage media, television broadcasting and real-time communication. In the course of creating this
Recommendation | International Standard, various requirements from typical applications have been considered, necessary
Rec. ITU-T H.266 (08/2020) 3
algorithmic elements have been developed, and these have been integrated into a single syntax. Hence, this
Recommendation | International Standard is designed to facilitate video data interchange among different applications.
2 Normative references
The following Recommendations and International Standards contain provisions which, through reference in this text,
constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated
were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this
Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent edition
of the Recommendations and Standards listed below. Members of IEC and ISO maintain registers of currently valid
International Standards. The Telecommunication Standardization Bureau of the ITU maintains a list of currently valid
ITU-T Recommendations.
2.1 Identical Recommendations | International Standards
– None
2.2 Paired Recommendations | International Standards equivalent in technical content
– Rec. ITU-T H.274 | ISO/IEC 23002-7 (in force) Versatile supplemental enhancement information messages
for coded video bitstreams
2.3 Additional references
– Rec. ITU-T T.35:2000, Procedure for the allocation of ITU-T defined codes for non standard facilities.
3 Definitions
For the purposes of this Recommendation | International Standard, the following definitions apply.
3.1 AC transform coefficient: Any transform coefficient for which the frequency index in at least one of the two
dimensions is non-zero.
3.2 access unit (AU): A set of PUs that belong to different layers and contain coded pictures associated with the
same time for output from the DPB.
3.3 adaptation parameter set (APS): A syntax structure containing syntax elements that apply to zero or more
slices as determined by zero or more syntax elements found in slice headers.
3.4 adaptive colour transform (ACT): A cross-component transform applied to the decoded residual of a coding
unit in the 4:4:4 colour format prior to reconstruction and loop filtering.
3.5 adaptive loop filter (ALF): A filtering process that is applied as part of the decoding process and is controlled
by parameters conveyed in an APS.
3.6 ALF APS: An APS that controls the ALF process.
3.7 associated GDR picture: The previous GDR picture (when present) in decoding order, for a particular picture
with nuh_layer_id equal to a particular value layerId, that has nuh_layer_id equal to layerId and between which
and the particular picture in decoding order there is no IRAP picture with nuh_layer_id equal to layerId.
3.8 associated GDR subpicture: The previous GDR subpicture (when present) in decoding order, for a particular
subpicture with nuh_layer_id equal to a particular value layerId and subpicture index equal to a particular value
subpicIdx, that has nuh_layer_id equal to layerId and subpicture index equal to subpicIdx and between which
and the particular subpicture in decoding order there is no IRAP subpicture with nuh_layer_id equal to layerId
and subpicture index equal to subpicIdx.
3.9 associated IRAP picture: The previous IRAP picture (when present) in decoding order, for a particular picture
with nuh_layer_id equal to a particular value layerId, that has nuh_layer_id equal to layerId and between which
and the particular picture in decoding order there is no GDR picture with nuh_layer_id equal to layerId.
3.10 associated IRAP subpicture: The previous IRAP subpicture (when present) in decoding order, for a particular
subpicture with nuh_layer_id equal to a particular value layerId and subpicture index equal to a particular value
subpicIdx, that has nuh_layer_id equal to layerId and subpicture index equal to subpicIdx and between which
and the particular subpicture in decoding order there is no GDR subpicture with nuh_layer_id equal to layerId
and subpicture index equal to subpicIdx.
4 Rec. ITU-T H.266 (08/2020)
3.11 associated non-VCL NAL unit: A non-VCL NAL unit (when present) for a VCL NAL unit where the VCL NAL
unit is the associated VCL NAL unit of the non-VCL NAL unit.
3.12 associated VCL NAL unit: The preceding VCL NAL unit in decoding order for a non-VCL NAL unit with
nal_unit_type equal to EOS_NUT, EOB_NUT, SUFFIX_APS_NUT, SUFFIX_SEI_NUT, FD_NUT,
RSV_NVCL_27, UNSPEC_30, or UNSPEC_31; or otherwise the next VCL NAL unit in decoding order.
3.13 bin: One bit of a bin string.
3.14 bin string: An intermediate binary representation of values of syntax elements from the binarization of the syntax
element.
3.15 binarization: A set of bin strings for all possible values of a syntax element.
3.16 binarization process: A unique mapping process of all possible values of a syntax element onto a set of bin
strings.
3.17 binary split: A split of a rectangular MxN block of samples into two blocks where a vertical split results in a
first (M / 2)xN block and a second (M / 2)xN block, and a horizontal split results in a first Mx(N / 2) block and
a second Mx(N / 2) block.
3.18 bi-predictive (B) slice: A slice that is decoded using intra prediction or using inter prediction with at most two
motion vectors and reference indices to predict the sample values of each block.
3.19 bitstream: A sequence of bits, in the form of a NAL unit stream or a byte stream, that forms the representation
of a sequence of AUs forming one or more coded video sequences (CVSs).
3.20 block: An MxN (M-column by N-row) array of samples, or an MxN array of transform coefficients.
3.21 block vector: A two-dimensional vector that provides an offset from the coordinates of the current coding block
to the coordinates of the reference block in the same decoded slice.
3.22 byte: A sequence of 8 bits, within which, when written or read as a sequence of bit values, the left-most and
right-most bits represent the most and least significant bits, respectively.
3.23 byte stream: An encapsulation of a NAL unit stream into a series of bytes containing start code prefixes and
NAL units.
3.24 byte-aligned: A position in a bitstream is byte-aligned when the position is an integer multiple of 8 bits from
the position of the first bit in the bitstream, and a bit or byte or syntax element is said to be byte-aligned when
the position at which it appears in a bitstream is byte-aligned.
3.25 chroma: A sample array or single sample representing one of the two colour difference signals related to the
primary colours, represented by the symbols Cb and Cr.
NOTE – The term chroma is used rather than the term chrominance in order to avoid the implication of the use of linear
light transfer characteristics that is often associated with the term chrominance.
3.26 clean random access (CRA) picture: An IRAP picture for which each VCL NAL unit has nal_unit_type equal
to CRA_NUT.
NOTE – A CRA picture does not use inter prediction in its decoding process, and could be the first picture in the
bitstream in decoding order, or could appear later in the bitstream. A CRA picture could have associated RADL or
RASL pictures. When a CRA picture has NoOutputBeforeRecoveryFlag equal to 1, the associated RASL pictures are
not output by the decoder, because they might not be decodable, as they could contain references to pictures that are
not present in the bitstream.
3.27 clean random access (CRA) PU: A PU in which the coded picture is a CRA picture.
3.28 clean random access (CRA) subpicture: An IRAP subpicture for which each VCL NAL unit has nal_unit_type
equal to CRA_NUT.
3.29 coded layer video sequence (CLVS): A sequence of PUs with the same value of nuh_layer_id that consists, in
decoding order, of a CLVSS PU, followed by zero or more PUs that are not CLVSS PUs, including all subsequent
PUs up to but not including any subsequent PU that is a CLVSS PU.
NOTE – A CLVSS PU could be an IDR PU, a CRA PU, or a GDR PU. The value of NoOutputBeforeRecoveryFlag is
equal to 1 for each IDR PU, and each CRA PU that has HandleCraAsClvsStartFlag equal to 1, and each CRA or GDR
PU that is the first PU in the layer of the bitstream in decoding order or the first PU in the layer of the bitstream that
follows an EOS NAL unit in the layer in decoding order.
3.30 coded layer video sequence start (CLVSS) PU: A PU in which the coded picture is a CLVSS picture.
3.31 coded layer video sequence start (CLVSS) picture: A coded picture that is an IRAP picture with
NoOutputBeforeRecoveryFlag equal to 1 or a GDR picture with NoOutputBeforeRecoveryFlag equal to 1.
Rec. ITU-T H.266 (08/2020) 5
3.32 coded picture: A coded representation of a picture comprising VCL NAL units with a particular value of
nuh_layer_id within an AU and containing all CTUs of the picture.
3.33 coded picture buffer (CPB): A first-in first-out buffer containing DUs in decoding order specified in the
hypothetical reference decoder in Annex C.
3.34 coded representation: A data element as represented in its coded form.
3.35 coded video sequence (CVS): A sequence of AUs that consists, in decoding order, of a CVSS AU, followed by
zero or more AUs that are not CVSS AUs, including all subsequent AUs up to but not including any subsequent
AU that is a CVSS AU.
3.36 coded video sequence start (CVSS) AU: An IRAP AU or GDR AU for which the coded picture in each PU is a
CLVSS picture.
3.37 coding block: An MxN block of samples for some values of M and N such that the division of a CTB into coding
blocks is a partitioning.
3.38 coding tree block (CTB): An N×N block of samples for some value of N such that the division of a component
into CTBs is a partitioning.
3.39 coding tree unit (CTU): A CTB of luma samples, two corresponding CTBs of chroma samples of a picture that
has three sample arrays, or a CTB of samples of a monochrome picture, and syntax structures used to code the
samples.
3.40 coding unit (CU): A coding block of luma samples, two corresponding coding blocks of chroma samples of a
picture that has three sample arrays in the single tree mode, or a coding block of luma samples of a picture that
has three sample arrays in the dual tree mode, or two coding blocks of chroma samples of a picture that has three
sample arrays in the dual tree mode, or a coding block of samples of a monochrome picture, and syntax structures
used to code the samples.
3.41 component: An array or single sample from one of the three arrays (luma and two chroma) that compose a
picture in 4:2:0, 4:2:2, or 4:4:4 colour format or the array or a single sample of the array that compose a picture
in monochrome format.
3.42 context variable: A variable specified for the adaptive binary arithmetic decoding process of a bin by an
equation containing recently decoded bins.
3.43 deblocking filter: A filtering process that is applied as part of the decoding process in order to minimize the
appearance of visual artefacts at the boundaries between blocks.
3.44 decoded picture: A picture produced by applying the decoding process to a coded picture.
3.45 decoded picture buffer (DPB): A buffer holding decoded pictures for reference, output reordering, or output
delay specified for the hypothetical reference decoder.
3.46 decoder: An embodiment of a decoding process.
3.47 decoding order: The order in which syntax elements are processed by the decoding process.
3.48 decoding process: The process specified in this Specification that reads a bitstream and derives decoded pictures
from it.
3.49 decoding unit (DU): An AU if DecodingUnitHrdFlag is equal to 0 or a subset of an AU otherwise, consisting of
one or more VCL NAL units in an AU and the associated non-VCL NAL units.
3.50 emulation prevention byte: A byte equal to 0x03 that is present within a NAL unit when the syntax elements of
the bitstream form certain patterns of byte values in a manner that ensures that no sequence of consecutive byte-
aligned bytes in the NAL unit can contain a start code prefix.
3.51 encoder: An embodiment of an encoding process.
3.52 encoding process: A process not specified in this Specification that produces a bitstream conforming to this
Specification.
3.53 filler data NAL units: NAL units with nal_unit_type equal to FD_NUT.
3.54 flag: A variable or single-bit syntax element that can take one of the two possible values: 0 and 1.
3.55 frequency index: A one-dimensional or two-dimensional index associated with a transform coefficient prior to
the application of a transform in the decoding process.
6 Rec. ITU-T H.266 (08/2020)
3.56 gradual decoding refresh (GDR) AU: An AU in which there is a PU for each layer present in the CVS and the
coded picture in each present PU is a GDR picture.
3.57 gradual decoding refresh (GDR) PU: A PU in which the coded picture is a GDR picture.
3.58 gradual decoding refresh (GDR) picture: A picture for which each VCL NAL unit has nal_unit_type equal to
GDR_NUT.
NOTE – The value of pps_mixed_nalu_types_in_pic_flag for a GDR picture is equal to 0. When
pps_mixed_nalu_types_in_pic_flag is equal to 0 for a picture, and any slice of the picture has nal_unit_type equal to
GDR_NUT, all other slices of the picture have the same value of nal_unit_type, and the picture is known to be a GDR
picture after receiving the first slice.
3.59 gradual decoding refresh (GDR) subpicture: A subpicture for which each VCL NAL unit has nal_unit_type
equal to GDR_NUT.
3.60 hypothetical reference decoder (HRD): A hypothetical decoder model that specifies constraints on the
variability of conforming NAL unit streams or conforming byte streams that an encoding process may produce.
3.61 hypothetical stream scheduler (HSS): A hypothetical delivery mechanism used for checking the conformance
of a bitstream or a decoder with regards to the timing and data flow of the input of a bitstream into the
hypothetical reference decoder.
3.62 instantaneous decoding refresh (IDR) picture: An IRAP picture for which each VCL NAL unit has
nal_unit_type equal to IDR_W_RADL or IDR_N_LP.
NOTE – An IDR picture does not use inter prediction in its decoding process, and could be the first picture in the
bitstream in decoding order, or could appear later in the bitstream. Each IDR picture is the first picture of a CVS in
decoding order. When an IDR picture for which each VCL NAL unit has nal_unit_type equal to IDR_W_RADL, it
could have associated RADL pictures. When an IDR picture for which each VCL NAL unit has nal_unit_type equal to
IDR_N_LP, it does not have any associated leading pictures. An IDR picture does not have associated RASL pictures.
3.63 instantaneous decoding refresh (IDR) PU: A PU in which the coded picture is an IDR picture.
3.64 instantaneous decoding refresh (IDR) subpicture: An IRAP subpicture for which each VCL NAL unit has
nal_unit_type equal to IDR_W_RADL or IDR_N_LP.
3.65 inter coding: Coding of a coding block, slice, or picture that uses inter prediction.
3.66 inter prediction: A prediction derived from blocks of sample values of one or more reference pictures as
determined by motion vectors.
3.67 inter-layer reference picture (ILRP): A picture in the same AU with the current picture, with nuh_layer_id
less than the nuh_layer_id of the current picture, and is marked as "used for long-term reference".
3.68 intra block copy (IBC) prediction: A prediction derived from blocks of sample values of the same decoded
slice as determined by block vectors.
3.69 intra coding: Coding of a coding block, slice, or picture that uses intra prediction.
3.70 intra prediction: A prediction derived from neighbouring sample values of the same decoded slice.
3.71 intra random access point (IRAP) AU: An AU in which there is a PU for each layer present in the CVS and
the coded picture in each PU is an IRAP picture.
3.72 intra random access point (IRAP) picture: A coded picture for which all VCL NAL units have the same value
of nal_unit_type in the range of IDR_W_RADL to CRA_NUT, inclusive.
NOTE 1 – An IRAP picture could be a CRA picture or an IDR picture. An IRAP picture does not use inter prediction
from reference pictures in the same layer in its decoding process. The first picture in the bitstream in decoding order is
an IRAP or GDR picture. For a single-layer bitstream, provided the necessary parameter sets are available when they
need to be referenced, the IRAP picture and all subsequent non-RASL pictures in the CLVS in decoding order are
correctly decodable without performing the decoding process of any pictures that precede the IRAP picture in decoding
order.
NOTE 2 – The value of pps_mixed_nalu_types_in_pic_flag for an IRAP picture is equal to 0. When
pps_mixed_nalu_types_in_pic_flag is equal to 0 for a picture, and any slice of the picture has nal_unit_type in the range
of IDR_W_RADL to CRA_NUT, inclusive, all other slices of the picture have the same value of nal_unit_type, and
the picture is known to be an IRAP picture after receiving the first slice.
3.73 intra random access point (IRAP) PU: A PU in which the coded picture is an IRAP picture.
3.74 intra random access point (IRAP) subpicture: A subpicture for which all VCL NAL units have the same value
of nal_unit_type in the range of IDR_W_RADL to CRA_NUT, inclusive.
3.75 intra (I) slice: A slice that is decoded using intra prediction only.
剩余515页未读,继续阅读
相关推荐
zgrobben
- 粉丝: 10
- 资源: 43
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 由小波滤波器系数求尺度函数和小波函数
- Visual C++ MFC 简明教程
- C51单片机程序实例大全
- Hardware Design Guidelines for TMS320F28xx .pdf
- C2000_系统设计(硬件部分)
- CISCO ACS 安装详细手册(中文版)
- ICMP 的说明与解释
- VLAN总结(对VLAN作了详细说明与介绍,其中包括对VTP的介绍)
- shell编程指南(有作者对重要部分进行高亮显示)
- EAserver程序员指南
- 《c#手册》非常不错
- C#语法攻略(详细介绍了.NET语法知识)
- CCNA路由链路负载均衡,浮动静态路由
- SQL循序渐进(看完不会你可以砍我)教程
- UML 互动图的教程PPT,63页,很详细
- Java+Servlet+API说明文档,JAVA人的真爱
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功