没有合适的资源?快使用搜索试试~ 我知道了~
首页H.266 VVC标准文档:下一代视频编码详解
H.266 VVC标准文档:下一代视频编码详解
5星 · 超过95%的资源 需积分: 18 24 下载量 160 浏览量
更新于2024-07-09
1
收藏 6.75MB PDF 举报
视频编解码VVC/H266 version 1文档是国际电信联盟(International Telecommunication Union, ITU-T)制定的一份关于视频编码和解码的标准,该标准专注于编码移动视频部分,隶属于音频视觉和多媒体系统系列。H.266(也称为HEVC,High Efficiency Video Coding)是继H.264之后的下一代视频压缩标准,旨在提供更高的图像质量和更低的带宽需求,以适应日益增长的视频传输需求,如高清、超高清甚至虚拟现实(VR)和增强现实(AR)应用。
VVC在编码技术上进行了多项创新,如采用更高效的帧内预测、更好的纹理预测模式、改进的熵编码算法以及多参考帧优化等,这些都使得VVC在保持高画质的同时,降低了数据传输的负载。版本1文档详细描述了VVC的具体规范,包括但不限于编码参数的选择、编码流程、解码器设计要求以及性能指标等。
在H.266标准中,文档结构被划分为多个部分:
1. 基础设施:这部分概述了视频服务的基础架构,为后续技术实现提供了背景。
2. 传输与同步:涉及多路复用和同步技术,确保视频流的稳定传输。
3. 系统特性:定义了系统的通用特性和规范,如视频电话系统的标准(H.100-H.199)。
4. 通信协议:涉及通信过程中的控制和交互。
5. 视频编码:核心内容,即VVC编码方法和策略,对于提升视频质量和效率至关重要。
6. 相关系统:补充了与视频编码相关的其他技术,如系统层面的优化。
7. 设备与终端:规定了音频视频服务所需的硬件设备规格。
8. 服务质量:讨论如何通过QoS(Quality of Service)来优化用户体验。
9. 远程呈现与沉浸环境:针对实时交互和沉浸式体验的特性和要求。
10. 多媒体补充服务:为多媒体应用提供的额外功能和服务。
11. 移动性和协作:概述移动通信与远程协作的处理方式和技术支持。
这份文档对于视频技术开发者、标准制定者、硬件制造商和内容创作者来说,是理解和实现高效视频编码与传输的重要参考资料,它不仅推动了视频行业的技术进步,也为未来的技术迭代奠定了基础。
2 Rec. ITU-T H.266 (08/2020)
in Annex C. These VUI parameters and SEI messages may be used together with this Recommendation | International
Standard.
Versions of this Recommendation | International Standard
This is the first version of this Recommendation | International Standard.
Overview of the design characteristics
The coded representation specified in the syntax is designed to enable a high compression capability for a desired image
or video quality. The algorithm is typically not mathematically lossless, as the exact source sample values are typically not
preserved through the encoding and decoding processes, although some modes are included that provide lossless coding
capability. A number of techniques are specified to enable highly efficient compression. Encoding algorithms (not
specified within the scope of this Recommendation | International Standard) may select between inter, intra, intra block
copy (IBC), and palette coding for block-shaped regions of each picture. Inter coding uses motion vectors for block-based
inter-picture prediction to exploit temporal statistical dependencies between different pictures, intra coding uses various
spatial prediction modes to exploit spatial statistical dependencies in the source signal within the same picture, and intra
block copy coding uses block displacement vectors to reference previously decoded regions of the same picture to exploit
statistical similarities among different areas of the same picture. Motion vectors, intra prediction modes, and IBC block
vectors are specified for a variety of block sizes in the picture. The prediction residual can then be further compressed
using a spatial transform to remove spatial correlation inside a block before it is quantized, producing a possibly irreversible
process that typically discards less important visual information while forming a close approximation to the source
samples. Finally, the motion vectors, intra prediction modes, and block vectors can also be further compressed using a
variety of prediction mechanisms, and, after prediction, are combined with the quantized transform coefficient information
and encoded using arithmetic coding.
How to read this document
It is suggested that the reader starts with clause 1 and moves on to clause 3. Clause 6 should be read for the geometrical
relationship of the source, input, and output of the decoder. Clause 7 specifies the order to parse syntax elements from the
bitstream. See clauses 7.1 to 7.3 for syntactical order and clause 7.4 for semantics; e.g., the scope, restrictions, and
conditions that are imposed on the syntax elements. The actual parsing for most syntax elements is specified in clause 9.
Finally, clause 8 specifies how the syntax elements are mapped into decoded samples. Throughout reading this document,
the reader should refer to clauses 2, 4, and 5 as needed. Annexes A through D also form an integral part of this
Recommendation | International Standard.
Annex A specifies profiles, each being tailored to certain application domains, and defines the so-called tiers and levels of
the profiles. Annex B specifies syntax and semantics of a byte stream format for delivery of coded video as an ordered
stream of bytes. Annex C specifies the hypothetical reference decoder, bitstream conformance, decoder conformance, and
the use of the hypothetical reference decoder to check bitstream and decoder conformance. Annex D specifies syntax and
semantics for supplemental enhancement information (SEI) message payloads that affect the conformance specifications
in Annex C. Rec. ITU-T H.274 | ISO/IEC 23002-7 specifies the syntax and semantics of the video usability information
(VUI) parameters as well as SEI messages that do not affect the conformance specifications in Annex C. These VUI
parameters and SEI messages may be used together with this Recommendation | International Standard.
1 Scope
This Recommendation | International Standard specifies a video coding technology known as Versatile Video Coding
(VVC), comprising a video coding technology with a compression capability that is substantially beyond that of the prior
generations of such standards and with sufficient versatility for effective use in a broad range of applications.
Only the syntax format, semantics, and associated decoding process requirements are specified, while other matters such
as pre-processing, the encoding process, system signalling and multiplexing, data loss recovery, post-processing, and video
display are considered to be outside the scope of this Recommendation | International Standard. Additionally, the internal
processing steps performed within a decoder are also considered to be outside the scope of this Recommendation |
International Standard; only the externally observable output behaviour is required to conform to the specifications of this
Recommendation | International Standard.
This Recommendation | International Standard is designed to be generic in the sense that it serves a wide range of
applications, bit rates, resolutions, qualities and services. Applications include, but are not limited to, video coding for
digital storage media, television broadcasting and real-time communication. In the course of creating this
Recommendation | International Standard, various requirements from typical applications have been considered, necessary
Rec. ITU-T H.266 (08/2020) 3
algorithmic elements have been developed, and these have been integrated into a single syntax. Hence, this
Recommendation | International Standard is designed to facilitate video data interchange among different applications.
2 Normative references
The following Recommendations and International Standards contain provisions which, through reference in this text,
constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated
were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this
Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent edition
of the Recommendations and Standards listed below. Members of IEC and ISO maintain registers of currently valid
International Standards. The Telecommunication Standardization Bureau of the ITU maintains a list of currently valid
ITU-T Recommendations.
2.1 Identical Recommendations | International Standards
– None
2.2 Paired Recommendations | International Standards equivalent in technical content
– Rec. ITU-T H.274 | ISO/IEC 23002-7 (in force) Versatile supplemental enhancement information messages
for coded video bitstreams
2.3 Additional references
– Rec. ITU-T T.35:2000, Procedure for the allocation of ITU-T defined codes for non standard facilities.
3 Definitions
For the purposes of this Recommendation | International Standard, the following definitions apply.
3.1 AC transform coefficient: Any transform coefficient for which the frequency index in at least one of the two
dimensions is non-zero.
3.2 access unit (AU): A set of PUs that belong to different layers and contain coded pictures associated with the
same time for output from the DPB.
3.3 adaptation parameter set (APS): A syntax structure containing syntax elements that apply to zero or more
slices as determined by zero or more syntax elements found in slice headers.
3.4 adaptive colour transform (ACT): A cross-component transform applied to the decoded residual of a coding
unit in the 4:4:4 colour format prior to reconstruction and loop filtering.
3.5 adaptive loop filter (ALF): A filtering process that is applied as part of the decoding process and is controlled
by parameters conveyed in an APS.
3.6 ALF APS: An APS that controls the ALF process.
3.7 associated GDR picture: The previous GDR picture (when present) in decoding order, for a particular picture
with nuh_layer_id equal to a particular value layerId, that has nuh_layer_id equal to layerId and between which
and the particular picture in decoding order there is no IRAP picture with nuh_layer_id equal to layerId.
3.8 associated GDR subpicture: The previous GDR subpicture (when present) in decoding order, for a particular
subpicture with nuh_layer_id equal to a particular value layerId and subpicture index equal to a particular value
subpicIdx, that has nuh_layer_id equal to layerId and subpicture index equal to subpicIdx and between which
and the particular subpicture in decoding order there is no IRAP subpicture with nuh_layer_id equal to layerId
and subpicture index equal to subpicIdx.
3.9 associated IRAP picture: The previous IRAP picture (when present) in decoding order, for a particular picture
with nuh_layer_id equal to a particular value layerId, that has nuh_layer_id equal to layerId and between which
and the particular picture in decoding order there is no GDR picture with nuh_layer_id equal to layerId.
3.10 associated IRAP subpicture: The previous IRAP subpicture (when present) in decoding order, for a particular
subpicture with nuh_layer_id equal to a particular value layerId and subpicture index equal to a particular value
subpicIdx, that has nuh_layer_id equal to layerId and subpicture index equal to subpicIdx and between which
and the particular subpicture in decoding order there is no GDR subpicture with nuh_layer_id equal to layerId
and subpicture index equal to subpicIdx.
4 Rec. ITU-T H.266 (08/2020)
3.11 associated non-VCL NAL unit: A non-VCL NAL unit (when present) for a VCL NAL unit where the VCL NAL
unit is the associated VCL NAL unit of the non-VCL NAL unit.
3.12 associated VCL NAL unit: The preceding VCL NAL unit in decoding order for a non-VCL NAL unit with
nal_unit_type equal to EOS_NUT, EOB_NUT, SUFFIX_APS_NUT, SUFFIX_SEI_NUT, FD_NUT,
RSV_NVCL_27, UNSPEC_30, or UNSPEC_31; or otherwise the next VCL NAL unit in decoding order.
3.13 bin: One bit of a bin string.
3.14 bin string: An intermediate binary representation of values of syntax elements from the binarization of the syntax
element.
3.15 binarization: A set of bin strings for all possible values of a syntax element.
3.16 binarization process: A unique mapping process of all possible values of a syntax element onto a set of bin
strings.
3.17 binary split: A split of a rectangular MxN block of samples into two blocks where a vertical split results in a
first (M / 2)xN block and a second (M / 2)xN block, and a horizontal split results in a first Mx(N / 2) block and
a second Mx(N / 2) block.
3.18 bi-predictive (B) slice: A slice that is decoded using intra prediction or using inter prediction with at most two
motion vectors and reference indices to predict the sample values of each block.
3.19 bitstream: A sequence of bits, in the form of a NAL unit stream or a byte stream, that forms the representation
of a sequence of AUs forming one or more coded video sequences (CVSs).
3.20 block: An MxN (M-column by N-row) array of samples, or an MxN array of transform coefficients.
3.21 block vector: A two-dimensional vector that provides an offset from the coordinates of the current coding block
to the coordinates of the reference block in the same decoded slice.
3.22 byte: A sequence of 8 bits, within which, when written or read as a sequence of bit values, the left-most and
right-most bits represent the most and least significant bits, respectively.
3.23 byte stream: An encapsulation of a NAL unit stream into a series of bytes containing start code prefixes and
NAL units.
3.24 byte-aligned: A position in a bitstream is byte-aligned when the position is an integer multiple of 8 bits from
the position of the first bit in the bitstream, and a bit or byte or syntax element is said to be byte-aligned when
the position at which it appears in a bitstream is byte-aligned.
3.25 chroma: A sample array or single sample representing one of the two colour difference signals related to the
primary colours, represented by the symbols Cb and Cr.
NOTE – The term chroma is used rather than the term chrominance in order to avoid the implication of the use of linear
light transfer characteristics that is often associated with the term chrominance.
3.26 clean random access (CRA) picture: An IRAP picture for which each VCL NAL unit has nal_unit_type equal
to CRA_NUT.
NOTE – A CRA picture does not use inter prediction in its decoding process, and could be the first picture in the
bitstream in decoding order, or could appear later in the bitstream. A CRA picture could have associated RADL or
RASL pictures. When a CRA picture has NoOutputBeforeRecoveryFlag equal to 1, the associated RASL pictures are
not output by the decoder, because they might not be decodable, as they could contain references to pictures that are
not present in the bitstream.
3.27 clean random access (CRA) PU: A PU in which the coded picture is a CRA picture.
3.28 clean random access (CRA) subpicture: An IRAP subpicture for which each VCL NAL unit has nal_unit_type
equal to CRA_NUT.
3.29 coded layer video sequence (CLVS): A sequence of PUs with the same value of nuh_layer_id that consists, in
decoding order, of a CLVSS PU, followed by zero or more PUs that are not CLVSS PUs, including all subsequent
PUs up to but not including any subsequent PU that is a CLVSS PU.
NOTE – A CLVSS PU could be an IDR PU, a CRA PU, or a GDR PU. The value of NoOutputBeforeRecoveryFlag is
equal to 1 for each IDR PU, and each CRA PU that has HandleCraAsClvsStartFlag equal to 1, and each CRA or GDR
PU that is the first PU in the layer of the bitstream in decoding order or the first PU in the layer of the bitstream that
follows an EOS NAL unit in the layer in decoding order.
3.30 coded layer video sequence start (CLVSS) PU: A PU in which the coded picture is a CLVSS picture.
3.31 coded layer video sequence start (CLVSS) picture: A coded picture that is an IRAP picture with
NoOutputBeforeRecoveryFlag equal to 1 or a GDR picture with NoOutputBeforeRecoveryFlag equal to 1.
Rec. ITU-T H.266 (08/2020) 5
3.32 coded picture: A coded representation of a picture comprising VCL NAL units with a particular value of
nuh_layer_id within an AU and containing all CTUs of the picture.
3.33 coded picture buffer (CPB): A first-in first-out buffer containing DUs in decoding order specified in the
hypothetical reference decoder in Annex C.
3.34 coded representation: A data element as represented in its coded form.
3.35 coded video sequence (CVS): A sequence of AUs that consists, in decoding order, of a CVSS AU, followed by
zero or more AUs that are not CVSS AUs, including all subsequent AUs up to but not including any subsequent
AU that is a CVSS AU.
3.36 coded video sequence start (CVSS) AU: An IRAP AU or GDR AU for which the coded picture in each PU is a
CLVSS picture.
3.37 coding block: An MxN block of samples for some values of M and N such that the division of a CTB into coding
blocks is a partitioning.
3.38 coding tree block (CTB): An N×N block of samples for some value of N such that the division of a component
into CTBs is a partitioning.
3.39 coding tree unit (CTU): A CTB of luma samples, two corresponding CTBs of chroma samples of a picture that
has three sample arrays, or a CTB of samples of a monochrome picture, and syntax structures used to code the
samples.
3.40 coding unit (CU): A coding block of luma samples, two corresponding coding blocks of chroma samples of a
picture that has three sample arrays in the single tree mode, or a coding block of luma samples of a picture that
has three sample arrays in the dual tree mode, or two coding blocks of chroma samples of a picture that has three
sample arrays in the dual tree mode, or a coding block of samples of a monochrome picture, and syntax structures
used to code the samples.
3.41 component: An array or single sample from one of the three arrays (luma and two chroma) that compose a
picture in 4:2:0, 4:2:2, or 4:4:4 colour format or the array or a single sample of the array that compose a picture
in monochrome format.
3.42 context variable: A variable specified for the adaptive binary arithmetic decoding process of a bin by an
equation containing recently decoded bins.
3.43 deblocking filter: A filtering process that is applied as part of the decoding process in order to minimize the
appearance of visual artefacts at the boundaries between blocks.
3.44 decoded picture: A picture produced by applying the decoding process to a coded picture.
3.45 decoded picture buffer (DPB): A buffer holding decoded pictures for reference, output reordering, or output
delay specified for the hypothetical reference decoder.
3.46 decoder: An embodiment of a decoding process.
3.47 decoding order: The order in which syntax elements are processed by the decoding process.
3.48 decoding process: The process specified in this Specification that reads a bitstream and derives decoded pictures
from it.
3.49 decoding unit (DU): An AU if DecodingUnitHrdFlag is equal to 0 or a subset of an AU otherwise, consisting of
one or more VCL NAL units in an AU and the associated non-VCL NAL units.
3.50 emulation prevention byte: A byte equal to 0x03 that is present within a NAL unit when the syntax elements of
the bitstream form certain patterns of byte values in a manner that ensures that no sequence of consecutive byte-
aligned bytes in the NAL unit can contain a start code prefix.
3.51 encoder: An embodiment of an encoding process.
3.52 encoding process: A process not specified in this Specification that produces a bitstream conforming to this
Specification.
3.53 filler data NAL units: NAL units with nal_unit_type equal to FD_NUT.
3.54 flag: A variable or single-bit syntax element that can take one of the two possible values: 0 and 1.
3.55 frequency index: A one-dimensional or two-dimensional index associated with a transform coefficient prior to
the application of a transform in the decoding process.
6 Rec. ITU-T H.266 (08/2020)
3.56 gradual decoding refresh (GDR) AU: An AU in which there is a PU for each layer present in the CVS and the
coded picture in each present PU is a GDR picture.
3.57 gradual decoding refresh (GDR) PU: A PU in which the coded picture is a GDR picture.
3.58 gradual decoding refresh (GDR) picture: A picture for which each VCL NAL unit has nal_unit_type equal to
GDR_NUT.
NOTE – The value of pps_mixed_nalu_types_in_pic_flag for a GDR picture is equal to 0. When
pps_mixed_nalu_types_in_pic_flag is equal to 0 for a picture, and any slice of the picture has nal_unit_type equal to
GDR_NUT, all other slices of the picture have the same value of nal_unit_type, and the picture is known to be a GDR
picture after receiving the first slice.
3.59 gradual decoding refresh (GDR) subpicture: A subpicture for which each VCL NAL unit has nal_unit_type
equal to GDR_NUT.
3.60 hypothetical reference decoder (HRD): A hypothetical decoder model that specifies constraints on the
variability of conforming NAL unit streams or conforming byte streams that an encoding process may produce.
3.61 hypothetical stream scheduler (HSS): A hypothetical delivery mechanism used for checking the conformance
of a bitstream or a decoder with regards to the timing and data flow of the input of a bitstream into the
hypothetical reference decoder.
3.62 instantaneous decoding refresh (IDR) picture: An IRAP picture for which each VCL NAL unit has
nal_unit_type equal to IDR_W_RADL or IDR_N_LP.
NOTE – An IDR picture does not use inter prediction in its decoding process, and could be the first picture in the
bitstream in decoding order, or could appear later in the bitstream. Each IDR picture is the first picture of a CVS in
decoding order. When an IDR picture for which each VCL NAL unit has nal_unit_type equal to IDR_W_RADL, it
could have associated RADL pictures. When an IDR picture for which each VCL NAL unit has nal_unit_type equal to
IDR_N_LP, it does not have any associated leading pictures. An IDR picture does not have associated RASL pictures.
3.63 instantaneous decoding refresh (IDR) PU: A PU in which the coded picture is an IDR picture.
3.64 instantaneous decoding refresh (IDR) subpicture: An IRAP subpicture for which each VCL NAL unit has
nal_unit_type equal to IDR_W_RADL or IDR_N_LP.
3.65 inter coding: Coding of a coding block, slice, or picture that uses inter prediction.
3.66 inter prediction: A prediction derived from blocks of sample values of one or more reference pictures as
determined by motion vectors.
3.67 inter-layer reference picture (ILRP): A picture in the same AU with the current picture, with nuh_layer_id
less than the nuh_layer_id of the current picture, and is marked as "used for long-term reference".
3.68 intra block copy (IBC) prediction: A prediction derived from blocks of sample values of the same decoded
slice as determined by block vectors.
3.69 intra coding: Coding of a coding block, slice, or picture that uses intra prediction.
3.70 intra prediction: A prediction derived from neighbouring sample values of the same decoded slice.
3.71 intra random access point (IRAP) AU: An AU in which there is a PU for each layer present in the CVS and
the coded picture in each PU is an IRAP picture.
3.72 intra random access point (IRAP) picture: A coded picture for which all VCL NAL units have the same value
of nal_unit_type in the range of IDR_W_RADL to CRA_NUT, inclusive.
NOTE 1 – An IRAP picture could be a CRA picture or an IDR picture. An IRAP picture does not use inter prediction
from reference pictures in the same layer in its decoding process. The first picture in the bitstream in decoding order is
an IRAP or GDR picture. For a single-layer bitstream, provided the necessary parameter sets are available when they
need to be referenced, the IRAP picture and all subsequent non-RASL pictures in the CLVS in decoding order are
correctly decodable without performing the decoding process of any pictures that precede the IRAP picture in decoding
order.
NOTE 2 – The value of pps_mixed_nalu_types_in_pic_flag for an IRAP picture is equal to 0. When
pps_mixed_nalu_types_in_pic_flag is equal to 0 for a picture, and any slice of the picture has nal_unit_type in the range
of IDR_W_RADL to CRA_NUT, inclusive, all other slices of the picture have the same value of nal_unit_type, and
the picture is known to be an IRAP picture after receiving the first slice.
3.73 intra random access point (IRAP) PU: A PU in which the coded picture is an IRAP picture.
3.74 intra random access point (IRAP) subpicture: A subpicture for which all VCL NAL units have the same value
of nal_unit_type in the range of IDR_W_RADL to CRA_NUT, inclusive.
3.75 intra (I) slice: A slice that is decoded using intra prediction only.
剩余515页未读,继续阅读
点击了解资源详情
点击了解资源详情
点击了解资源详情
2021-07-21 上传
2022-10-25 上传
2024-10-29 上传
2024-10-29 上传
2024-10-29 上传
2022-08-29 上传
洌泉_就这样吧
- 粉丝: 7278
- 资源: 8
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 毕业设计&课设--个人QT毕业设计项目 校园商铺.zip
- zharf:ZHARF项目
- lotus-openrpc-client:从OpenRPC定义生成的Typescript中的Lotus API客户端
- Excel模板客户信息登记表.zip
- system:简易易用的精简和快速的微型PHP系统库
- devrioclaro.github.io:DevRioClaro 没有 GitHub
- streams:应用程序可在体内传输清晰的视频。 Hecha en React con Redux
- automata.js:一个用于创建元胞自动机JavaScript库
- angular-course:使用angular的简单应用
- 毕业设计&课设--大学毕业设计,远程控制工具集,包含远程命令行,远程文件管理,远程桌面,已停止维护。.zip
- RMarkdown:分配
- 沙盒无服务器vpc-elasticearch
- Generative-Design-Systems-with-P5js:随附一系列视频的代码
- Data_analysis:使用JFreeChart库的Java数据分析程序
- Excel模板每日体温测量记录表.zip
- coppa:电晕进步和积极强化应用程序
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功