没有合适的资源?快使用搜索试试~ 我知道了~
首页HEVC标准:H.265视频编码的新里程碑
HEVC标准:H.265视频编码的新里程碑
5星 · 超过95%的资源 需积分: 40 70 下载量 76 浏览量
更新于2024-07-25
1
收藏 2.68MB PDF 举报
HEVC标准,全称为High Efficiency Video Coding(高效视频编码),是由国际电信联盟(International Telecommunication Union,ITU)制定的一套新的视频编码标准,正式名称为 Recommendation ITU-T H.265。该标准是在原有的H.264标准基础上,为了应对高清和超高清视频流媒体的需求而提出的,旨在提供更高的视频压缩效率、更低的码率以及更好的图像质量。
HEVC标准在音频视觉和多媒体系统基础设施中,特别关注编码移动视频这一部分,代号为H.260–H.279。它采用了更先进的帧内预测、变分编码、并行处理等技术,使得视频压缩率相比H.264提高了约50%,在相同的比特率下,能够提供两倍甚至更高的画质。这一进步对于视频会议、在线流媒体、VR/AR等领域具有重要意义,尤其在移动设备上,能有效降低数据传输压力,提升用户体验。
HEVC标准的设计兼顾了系统架构、多路复用与同步、通信规程以及终端设备的要求,覆盖了从编码原理到系统层面的多个方面。例如,它定义了H.220–H.229系列用于描述视频系统的体系结构,H.240–H.259专注于视频编码的特性,H.260–H.279则关注与相关系统相关的概念,如编码效率优化、帧结构设计等。
此外,HEVC还涉及服务质量(QoS)架构、多媒体目录服务、移动性和协作流程等多个领域,展示了其全面且前瞻性的设计。例如,H.360–H.369关注多媒体的补充服务,H.450–H.499则涵盖了移动性和协作的概述、协议和流程,确保在不断变化的网络环境下,视频通信的稳定性和兼容性。
HEVC标准是信息技术行业中的一项里程碑,它不仅提升了视频编码技术的性能,而且对整个音频视觉和多媒体系统基础设施进行了深度整合,为现代通信和娱乐应用提供了强大的支持。随着高清和超高清内容的普及,HEVC标准将继续在推动视频技术的发展中发挥关键作用。
2 Rec. ITU-T H.265 (04/2013)
This is the first version of this Specification. Additional versions are anticipated.
0.5 Profiles, tiers and levels
This Recommendation | International Standard is designed to be generic in the sense that it serves a wide range of
applications, bit rates, resolutions, qualities, and services. Applications should cover, among other things, digital storage
media, television broadcasting and real-time communications. In the course of creating this Specification, various
requirements from typical applications have been considered, necessary algorithmic elements have been developed, and
these have been integrated into a single syntax. Hence, this Specification will facilitate video data interchange among
different applications.
Considering the practicality of implementing the full syntax of this Specification, however, a limited number of subsets
of the syntax are also stipulated by means of "profiles", "tiers", and "levels". These and other related terms are formally
defined in clause 3.
A "profile" is a subset of the entire bitstream syntax that is specified in this Recommendation | International Standard.
Within the bounds imposed by the syntax of a given profile it is still possible to require a very large variation in the
performance of encoders and decoders depending upon the values taken by syntax elements in the bitstream such as the
specified size of the decoded pictures. In many applications, it is currently neither practical nor economic to implement
a decoder capable of dealing with all hypothetical uses of the syntax within a particular profile.
In order to deal with this problem, "tiers" and "levels" are specified within each profile. A level of a tier is a specified
set of constraints imposed on values of the syntax elements in the bitstream. These constraints may be simple limits on
values. Alternatively they may take the form of constraints on arithmetic combinations of values (e.g., picture width
multiplied by picture height multiplied by number of pictures decoded per second). A level specified for a lower tier is
more constrained than a level specified for a higher tier.
Coded video content conforming to this Recommendation | International Standard uses a common syntax. In order to
achieve a subset of the complete syntax, flags, parameters, and other syntax elements are included in the bitstream that
signal the presence or absence of syntactic elements that occur later in the bitstream.
0.6 Overview of the design characteristics
The coded representation specified in the syntax is designed to enable a high compression capability for a desired image
or video quality. The algorithm is typically not lossless, as the exact source sample values are typically not preserved
through the encoding and decoding processes. A number of techniques may be used to achieve highly efficient
compression. Encoding algorithms (not specified in this Recommendation | International Standard) may select between
inter and intra coding for block-shaped regions of each picture. Inter coding uses motion vectors for block-based inter
prediction to exploit temporal statistical dependencies between different pictures. Intra coding uses various spatial
prediction modes to exploit spatial statistical dependencies in the source signal for a single picture. Motion vectors and
intra prediction modes may be specified for a variety of block sizes in the picture. The prediction residual may then be
further compressed using a transform to remove spatial correlation inside the transform block before it is quantized,
producing a possibly irreversible process that typically discards less important visual information while forming a close
approximation to the source samples. Finally, the motion vectors or intra prediction modes may also be further
compressed using a variety of prediction mechanisms, and, after prediction, are combined with the quantized transform
coefficient information and encoded using arithmetic coding.
0.7 How to read this Specification
It is suggested that the reader starts with clause 1 (Scope) and moves on to clause 3 (Definitions). Clause 6 should be
read for the geometrical relationship of the source, input, and output of the decoder. Clause 7 (Syntax and semantics)
specifies the order to parse syntax elements from the bitstream. See clauses 7.1–7.3 for syntactical order and see
clause 7.4 for semantics; e.g., the scope, restrictions, and conditions that are imposed on the syntax elements. The actual
parsing for most syntax elements is specified in clause 9 (Parsing process). Clause 10 (Sub-bitstream extraction process)
specifies the sub-bitstream extraction process. Finally, clause 8 (Decoding process) specifies how the syntax elements
are mapped into decoded samples. Throughout reading this Specification, the reader should refer to clauses 2
(Normative references), 4 (Abbreviations), and 5 (Conventions) as needed. Annexes A through E also form an integral
part of this Recommendation | International Standard.
Annex A specifies profiles each being tailored to certain application domains, and defines the so-called tiers and levels
of the profiles. Annex B specifies syntax and semantics of a byte stream format for delivery of coded video as an
ordered stream of bytes. Annex C specifies the hypothetical reference decoder, bitstream conformance, decoder
conformance, and the use of the hypothetical reference decoder to check bitstream and decoder conformance. Annex D
specifies syntax and semantics for supplemental enhancement information message payloads. Annex E specifies syntax
and semantics of the video usability information parameters of the sequence parameter set.
Rec. ITU-T H.265 (04/2013) 3
Throughout this Specification, statements appearing with the preamble "NOTE –" are informative and are not an
integral part of this Recommendation | International Standard.
1 Scope
This Recommendation | International Standard specifies high efficiency video coding.
2 Normative references
2.1 General
The following Recommendations and International Standards contain provisions which, through reference in this text,
constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated
were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this
Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent
edition of the Recommendations and Standards listed below. Members of IEC and ISO maintain registers of currently
valid International Standards. The Telecommunication Standardization Bureau of the ITU maintains a list of currently
valid ITU-T Recommendations.
2.2 Identical Recommendations | International Standards
– None
2.3 Paired Recommendations | International Standards equivalent in technical content
– None
2.4 Additional references
– Recommendation ITU-T T.35 (in force), Procedure for the allocation of ITU-T defined codes for
non-standard facilities.
– ISO/IEC 11578: in force, Information technology — Open Systems Interconnection — Remote Procedure
Call (RPC).
– ISO 11664-1: in force, Colorimetry — Part 1: CIE standard colorimetric observers.
– ISO 12232: in force, Photography – Digital still cameras – Determination of exposure index, ISO speed
ratings, standard output sensitivity, and recommended exposure index.
– IETF RFC 1321 (in force), The MD5 Message-Digest Algorithm.
3 Definitions
For the purposes of this Recommendation | International Standard, the following definitions apply:
3.1 access unit: A set of NAL units that are associated with each other according to a specified classification rule,
are consecutive in decoding order, and contain exactly one coded picture.
NOTE – In addition to containing the VCL NAL units of the coded picture, an access unit may also contain non-
VCL NAL units. The decoding of an access unit always results in a decoded picture.
3.2 AC transform coefficient: Any transform coefficient for which the frequency index in at least one of the two
dimensions is non-zero.
3.3 associated non-VCL NAL unit: A non-VCL NAL unit (when present) for a VCL NAL unit where the VCL
NAL unit is the associated VCL NAL unit of the non-VCL NAL unit.
3.4 associated IRAP picture: The previous IRAP picture in decoding order (when present).
3.5 associated VCL NAL unit: The preceding VCL NAL unit in decoding order for a non-VCL NAL unit with
nal_unit_type equal to EOS_NUT, EOB_NUT, FD_NUT, or SUFFIX_SEI_NUT, or in the ranges of
RSV_NVCL45..RSV_NVCL47 or UNSPEC56..UNSPEC63; or otherwise the next VCL NAL unit in decoding
order.
3.6 bin: One bit of a bin string.
4 Rec. ITU-T H.265 (04/2013)
3.7 binarization: A set of bin strings for all possible values of a syntax element.
3.8 binarization process: A unique mapping process of all possible values of a syntax element onto a set of bin
strings.
3.9 bin string: An intermediate binary representation of values of syntax elements from the binarization of the
syntax element.
3.10 bi-predictive (B) slice: A slice that may be decoded using intra prediction or inter prediction using at most
two motion vectors and reference indices to predict the sample values of each block.
3.11 bitstream: A sequence of bits, in the form of a NAL unit stream or a byte stream, that forms the
representation of coded pictures and associated data forming one or more CVSs.
3.12 block: An MxN (M-column by N-row) array of samples, or an MxN array of transform coefficients.
3.13 broken link: A location in a bitstream at which it is indicated that some subsequent pictures in decoding
order may contain serious visual artefacts due to unspecified operations performed in the generation of the
bitstream.
3.14 broken link access (BLA) access unit: An access unit in which the coded picture is a BLA picture.
3.15 broken link access (BLA) picture: An IRAP picture for which each VCL NAL unit has nal_unit_type equal
to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.
NOTE – A BLA picture contains only I slices, and may be the first picture in the bitstream in decoding order, or
may appear later in the bitstream. Each BLA picture begins a new CVS, and has the same effect on the decoding
process as an IDR picture. However, a BLA picture contains syntax elements that specify a non-empty RPS. When a
BLA picture for which each VCL NAL unit has nal_unit_type equal to BLA_W_LP, it may have associated RASL
pictures, which are not output by the decoder and may not be decodable, as they may contain references to pictures
that are not present in the bitstream. When a BLA picture for which each VCL NAL unit has nal_unit_type equal to
BLA_W_LP, it may also have associated RADL pictures, which are specified to be decoded. When a BLA picture
for which each VCL NAL unit has nal_unit_type equal to BLA_W_RADL, it does not have associated RASL
pictures but may have associated RADL pictures. When a BLA picture for which each VCL NAL unit has
nal_unit_type equal to BLA_N_LP, it does not have any associated leading pictures.
3.16 buffering period: The set of access units starting with an access unit that contains a buffering period SEI
message and containing all subsequent access units in decoding order up to but not including the next access
unit (when present) that contains a buffering period SEI message.
3.17 byte: A sequence of 8 bits, within which, when written or read as a sequence of bit values, the left-most and
right-most bits represent the most and least significant bits, respectively.
3.18 byte-aligned: A position in a bitstream is byte-aligned when the position is an integer multiple of 8 bits from
the position of the first bit in the bitstream, and a bit or byte or syntax element is said to be byte-aligned when
the position at which it appears in a bitstream is byte-aligned.
3.19 byte stream: An encapsulation of a NAL unit stream containing start code prefixes and NAL units as specified
in Annex B.
3.20 can: A term used to refer to behaviour that is allowed, but not necessarily required.
3.21 chroma: An adjective, represented by the symbols Cb and Cr, specifying that a sample array or single sample
is representing one of the two colour difference signals related to the primary colours.
NOTE – The term chroma is used rather than the term chrominance in order to avoid the implication of the use of
linear light transfer characteristics that is often associated with the term chrominance.
3.22 clean random access (CRA) access unit: An access unit in which the coded picture is a CRA picture.
3.23 clean random access (CRA) picture: An IRAP picture for which each VCL NAL unit has nal_unit_type
equal to CRA_NUT.
NOTE – A CRA picture contains only I slices, and may be the first picture in the bitstream in decoding order, or
may appear later in the bitstream. A CRA picture may have associated RADL or RASL pictures. When a CRA
picture has NoRaslOutputFlag equal to 1, the associated RASL pictures are not output by the decoder, because they
may not be decodable, as they may contain references to pictures that are not present in the bitstream.
3.24 coded picture: A coded representation of a picture containing all coding tree units of the picture.
3.25 coded picture buffer (CPB): A first-in first-out buffer containing decoding units in decoding order specified
in the hypothetical reference decoder in Annex C.
3.26 coded representation: A data element as represented in its coded form.
Rec. ITU-T H.265 (04/2013) 5
3.27 coded slice segment NAL unit: A NAL unit that has nal_unit_type in the range of TRAIL_N to RASL_R,
inclusive, or in the range of BLA_W_LP to RSV_IRAP_VCL23, inclusive, which indicates that the NAL unit
contains a coded slice segment.
3.28 coded video sequence (CVS): A sequence of access units that consists, in decoding order, of an IRAP access
unit with NoRaslOutputFlag equal to 1, followed by zero or more access units that are not IRAP access units
with NoRaslOutputFlag equal to 1, including all subsequent access units up to but not including any
subsequent access unit that is an IRAP access unit with NoRaslOutputFlag equal to 1.
NOTE – An IRAP access unit may be an IDR access unit, a BLA access unit, or a CRA access unit. The value of
NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA access unit, and each CRA access unit that is
the first access unit in the bitstream in decoding order, is the first access unit that follows an end of sequence NAL
unit in decoding order, or has HandleCraAsBlaFlag equal to 1.
3.29 coding block: An NxN block of samples for some value of N such that the division of a coding tree block into
coding blocks is a partitioning.
3.30 coding tree block: An NxN block of samples for some value of N such that the division of a component into
coding tree blocks is a partitioning.
3.31 coding tree unit: A coding tree block of luma samples, two corresponding coding tree blocks of chroma
samples of a picture that has three sample arrays, or a coding tree block of samples of a monochrome picture
or a picture that is coded using three separate colour planes and syntax structures used to code the samples.
3.32 coding unit: A coding block of luma samples, two corresponding coding blocks of chroma samples of a
picture that has three sample arrays, or a coding block of samples of a monochrome picture or a picture that is
coded using three separate colour planes and syntax structures used to code the samples.
3.33 component: An array or single sample from one of the three arrays (luma and two chroma) that compose a
picture in 4:2:0, 4:2:2, or 4:4:4 colour format or the array or a single sample of the array that compose a
picture in monochrome format.
3.34 context variable: A variable specified for the adaptive binary arithmetic decoding process of a bin by an
equation containing recently decoded bins.
3.35
cropped decoded picture: The result of cropping a decoded picture based on the conformance cropping
window specified in the SPS that is referred to by the corresponding coded picture.
3.36 decoded picture: A decoded picture is derived by decoding a coded picture.
3.37 decoded picture buffer (DPB): A buffer holding decoded pictures for reference, output reordering, or output
delay specified for the hypothetical reference decoder in Annex C.
3.38 decoder: An embodiment of a decoding process.
3.39 decoder under test (DUT): A decoder that is tested for conformance to this Specification by operating the
hypothetical stream scheduler to deliver a conforming bitstream to the decoder and to the hypothetical
reference decoder and comparing the values and timing or order of the output of the two decoders.
3.40 decoding order: The order in which syntax elements are processed by the decoding process.
3.41 decoding process: The process specified in this Specification that reads a bitstream and derives decoded
pictures from it.
3.42 decoding unit: An access unit if SubPicHrdFlag is equal to 0 or a subset of an access unit otherwise,
consisting of one or more VCL NAL units in an access unit and the associated non-VCL NAL units.
3.43 dependent slice segment: A slice segment for which the values of some syntax elements of the slice segment
header are inferred from the values for the preceding independent slice segment in decoding order.
3.44 display process: A process not specified in this Specification having, as its input, the
cropped decoded
pictures that are the output of the decoding process.
3.45 elementary stream: A sequence of one or more bitstreams.
NOTE – An elementary stream that consists of two or more bitstreams would typically have been formed by splicing
together two or more bitstreams (or parts thereof).
3.46 emulation prevention byte: A byte equal to 0x03 that is present within a NAL unit when the syntax elements
of the bitstream form certain patterns of byte values in a manner that ensures that no sequence of consecutive
byte-aligned bytes in the NAL unit can contain a start code prefix.
3.47 encoder: An embodiment of an encoding process.
6 Rec. ITU-T H.265 (04/2013)
3.48 encoding process: A process not specified in this Specification that produces a bitstream conforming to this
Specification.
3.49 field: An assembly of alternative rows of samples of a frame.
3.50 filler data NAL units: NAL units with nal_unit_type equal to FD_NUT.
3.51 flag: A variable that can take one of the two possible values 0 and 1.
3.52 frame: The composition of a top field and a bottom field, where sample rows 0, 2, 4, ... originate from the top
field and sample rows 1, 3, 5, ... originate from the bottom field.
3.53 frequency index: A one-dimensional or two-dimensional index associated with a transform coefficient prior
to an inverse transform part of the decoding process.
3.54 hypothetical reference decoder (HRD): A hypothetical decoder model that specifies constraints on the
variability of conforming NAL unit streams or conforming byte streams that an encoding process may
produce.
3.55 hypothetical stream scheduler (HSS): A hypothetical delivery mechanism used for checking the
conformance of a bitstream or a decoder with regards to the timing and data flow of the input of a bitstream
into the hypothetical reference decoder.
3.56 independent slice segment: A slice segment for which the values of the syntax elements of the slice segment
header are not inferred from the values for a preceding slice segment.
3.57 informative: A term used to refer to content provided in this Specification that does not establish any
mandatory requirements for conformance to this Specification and thus is not considered an integral part of
this Specification.
3.58 instantaneous decoding refresh (IDR) access unit: An access unit in which the coded picture is an
IDR
picture.
3.59 instantaneous decoding refresh (IDR) picture: An IRAP picture for which each VCL NAL unit has
nal_unit_type equal to IDR_W_RADL or IDR_N_LP.
NOTE – An IDR picture contains only I slices, and may be the first picture in the bitstream in decoding order, or
may appear later in the bitstream. Each IDR picture is the first picture of a CVS in decoding order. When an IDR
picture for which each VCL NAL unit has nal_unit_type equal to IDR_W_RADL, it may have associated RADL
pictures. When an IDR picture for which each VCL NAL unit has nal_unit_type equal to IDR_N_LP, it does not
have any associated leading pictures. An IDR picture does not have associated RASL pictures.
3.60 inter coding: Coding of a coding block, slice, or picture that uses inter prediction.
3.61 inter prediction: A prediction derived in a manner that is dependent on data elements (e.g., sample values or
motion vectors) of pictures other than the current picture.
3.62 intra coding: Coding of a coding block, slice, or picture that uses intra prediction.
3.63 intra prediction: A prediction derived from only data elements (e.g., sample values) of the same decoded
slice.
3.64 intra random access point (IRAP) access unit: An access unit in which the coded picture is an IRAP
picture.
3.65 intra random access point (IRAP) picture: A coded picture for which each VCL NAL unit has nal_unit_type
in the range of BLA_W_LP to RSV_IRAP_VCL23, inclusive.
NOTE – An IRAP picture contains only I slices, and may be a BLA picture, a CRA picture or an IDR picture. The
first picture in the bitstream in decoding order must be an IRAP picture. Provided the necessary parameter sets are
available when they need to be activated, the IRAP picture and all subsequent non-RASL pictures in decoding order
can be correctly decoded without performing the decoding process of any pictures that precede the IRAP picture in
decoding order. There may be pictures in a bitstream that contain only I slices that are not IRAP pictures.
3.66 intra (I) slice: A slice that is decoded using intra prediction only.
3.67 inverse transform: A part of the decoding process by which a set of transform coefficients are converted into
spatial-domain values.
3.68 layer: A set of VCL NAL units that all have a particular value of nuh_layer_id and the associated non-VCL
NAL units, or one of a set of syntactical structures having a hierarchical relationship.
NOTE – Depending on the context, either the first layer concept or the second layer concept applies. The first layer
concept is also referred to as a scalable layer, wherein a layer may be a spatial scalable layer, a quality scalable
layer, a view, etc. A temporal true subset of a scalable layer is not referred to as a layer but referred to as a sub-layer
剩余316页未读,继续阅读
2018-12-29 上传
2018-10-17 上传
2017-10-17 上传
2021-11-30 上传
点击了解资源详情
yangxiao_xiang
- 粉丝: 239
- 资源: 10
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- C语言数组操作:高度检查器编程实践
- 基于Swift开发的嘉定单车LBS iOS应用项目解析
- 钗头凤声乐表演的二度创作分析报告
- 分布式数据库特训营全套教程资料
- JavaScript开发者Robert Bindar的博客平台
- MATLAB投影寻踪代码教程及文件解压缩指南
- HTML5拖放实现的RPSLS游戏教程
- HT://Dig引擎接口,Ampoliros开源模块应用
- 全面探测服务器性能与PHP环境的iprober PHP探针v0.024
- 新版提醒应用v2:基于MongoDB的数据存储
- 《我的世界》东方大陆1.12.2材质包深度体验
- Hypercore Promisifier: JavaScript中的回调转换为Promise包装器
- 探索开源项目Artifice:Slyme脚本与技巧游戏
- Matlab机器人学习代码解析与笔记分享
- 查尔默斯大学计算物理作业HP2解析
- GitHub问题管理新工具:GIRA-crx插件介绍
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功