没有合适的资源?快使用搜索试试~ 我知道了~
首页高效视频编码标准:深入解析ITU-T H.265
高效视频编码标准:深入解析ITU-T H.265
4星 · 超过85%的资源 需积分: 40 5 下载量 68 浏览量
更新于2024-07-24
收藏 2.68MB PDF 举报
"H.265 规范是国际电信联盟(ITU-T)制定的一种高效视频编码标准,旨在提升高分辨率视频的压缩效率,降低带宽需求,同时保持高质量的视频体验。"
H.265,也被称为High Efficiency Video Coding(HEVC),是继H.264/AVC之后的下一代视频编码技术。它在2013年由ITU-T发布,主要服务于视频传输、存储和播放等领域,尤其在4K和8K超高清视频时代,其优势更为显著。
H.265规范的核心目标是提高编码效率,这主要是通过以下方式实现的:
1. **块划分的灵活性**:H.265引入了更小的编码单元,最小可达到4x4像素,甚至更细粒度的划分,允许更精确的运动估计和补偿,从而减少冗余信息。
2. **多参考帧预测**:与H.264相比,H.265支持更多的参考帧,可以更有效地利用时间相关性,提高压缩效果。
3. **更高级的熵编码**:H.265采用了更先进的熵编码算法,如上下文自适应二进制算术编码(CABAC)的改进版本,能更高效地压缩编码数据。
4. **更精细的运动补偿**:使用更复杂的运动矢量预测,包括分块运动补偿,能够更准确地预测和编码像素块的运动。
5. **残留差分块分区**:采用更灵活的残留差分块结构,根据块的内容进行不同模式的编码。
6. **熵编码优化**:包括更精细的区间划分和适应性上下文模型,进一步提升了压缩效率。
7. **更强的去块效应滤波器**:为了减少高压缩率下可能出现的视觉质量下降,H.265增强了去块效应滤波器。
8. **预测结构的增强**:引入了新的预测模式,如深度预测、合并预测等,以适应复杂场景和多视角视频。
H.265规范的这些改进使得相同质量的视频,相比于H.264,所需的数据量大约减少了一半。这对于网络带宽有限的环境,如移动通信、在线流媒体服务和无线视频传输等,具有极大的价值。同时,它也推动了超高清视频内容的普及和发展。
H.265是现代视频编码技术的重要里程碑,它的出现极大地优化了视频内容的传输和处理,降低了存储和传输的成本,为高清视频的广泛应用提供了强大的技术支持。
2 Rec. ITU-T H.265 (04/2013)
This is the first version of this Specification. Additional versions are anticipated.
0.5 Profiles, tiers and levels
This Recommendation | International Standard is designed to be generic in the sense that it serves a wide range of
applications, bit rates, resolutions, qualities, and services. Applications should cover, among other things, digital storage
media, television broadcasting and real-time communications. In the course of creating this Specification, various
requirements from typical applications have been considered, necessary algorithmic elements have been developed, and
these have been integrated into a single syntax. Hence, this Specification will facilitate video data interchange among
different applications.
Considering the practicality of implementing the full syntax of this Specification, however, a limited number of subsets
of the syntax are also stipulated by means of "profiles", "tiers", and "levels". These and other related terms are formally
defined in clause 3.
A "profile" is a subset of the entire bitstream syntax that is specified in this Recommendation | International Standard.
Within the bounds imposed by the syntax of a given profile it is still possible to require a very large variation in the
performance of encoders and decoders depending upon the values taken by syntax elements in the bitstream such as the
specified size of the decoded pictures. In many applications, it is currently neither practical nor economic to implement
a decoder capable of dealing with all hypothetical uses of the syntax within a particular profile.
In order to deal with this problem, "tiers" and "levels" are specified within each profile. A level of a tier is a specified
set of constraints imposed on values of the syntax elements in the bitstream. These constraints may be simple limits on
values. Alternatively they may take the form of constraints on arithmetic combinations of values (e.g., picture width
multiplied by picture height multiplied by number of pictures decoded per second). A level specified for a lower tier is
more constrained than a level specified for a higher tier.
Coded video content conforming to this Recommendation | International Standard uses a common syntax. In order to
achieve a subset of the complete syntax, flags, parameters, and other syntax elements are included in the bitstream that
signal the presence or absence of syntactic elements that occur later in the bitstream.
0.6 Overview of the design characteristics
The coded representation specified in the syntax is designed to enable a high compression capability for a desired image
or video quality. The algorithm is typically not lossless, as the exact source sample values are typically not preserved
through the encoding and decoding processes. A number of techniques may be used to achieve highly efficient
compression. Encoding algorithms (not specified in this Recommendation | International Standard) may select between
inter and intra coding for block-shaped regions of each picture. Inter coding uses motion vectors for block-based inter
prediction to exploit temporal statistical dependencies between different pictures. Intra coding uses various spatial
prediction modes to exploit spatial statistical dependencies in the source signal for a single picture. Motion vectors and
intra prediction modes may be specified for a variety of block sizes in the picture. The prediction residual may then be
further compressed using a transform to remove spatial correlation inside the transform block before it is quantized,
producing a possibly irreversible process that typically discards less important visual information while forming a close
approximation to the source samples. Finally, the motion vectors or intra prediction modes may also be further
compressed using a variety of prediction mechanisms, and, after prediction, are combined with the quantized transform
coefficient information and encoded using arithmetic coding.
0.7 How to read this Specification
It is suggested that the reader starts with clause 1 (Scope) and moves on to clause 3 (Definitions). Clause 6 should be
read for the geometrical relationship of the source, input, and output of the decoder. Clause 7 (Syntax and semantics)
specifies the order to parse syntax elements from the bitstream. See clauses 7.1–7.3 for syntactical order and see
clause 7.4 for semantics; e.g., the scope, restrictions, and conditions that are imposed on the syntax elements. The actual
parsing for most syntax elements is specified in clause 9 (Parsing process). Clause 10 (Sub-bitstream extraction process)
specifies the sub-bitstream extraction process. Finally, clause 8 (Decoding process) specifies how the syntax elements
are mapped into decoded samples. Throughout reading this Specification, the reader should refer to clauses 2
(Normative references), 4 (Abbreviations), and 5 (Conventions) as needed. Annexes A through E also form an integral
part of this Recommendation | International Standard.
Annex A specifies profiles each being tailored to certain application domains, and defines the so-called tiers and levels
of the profiles. Annex B specifies syntax and semantics of a byte stream format for delivery of coded video as an
ordered stream of bytes. Annex C specifies the hypothetical reference decoder, bitstream conformance, decoder
conformance, and the use of the hypothetical reference decoder to check bitstream and decoder conformance. Annex D
specifies syntax and semantics for supplemental enhancement information message payloads. Annex E specifies syntax
and semantics of the video usability information parameters of the sequence parameter set.
Rec. ITU-T H.265 (04/2013) 3
Throughout this Specification, statements appearing with the preamble "NOTE –" are informative and are not an
integral part of this Recommendation | International Standard.
1 Scope
This Recommendation | International Standard specifies high efficiency video coding.
2 Normative references
2.1 General
The following Recommendations and International Standards contain provisions which, through reference in this text,
constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated
were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this
Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent
edition of the Recommendations and Standards listed below. Members of IEC and ISO maintain registers of currently
valid International Standards. The Telecommunication Standardization Bureau of the ITU maintains a list of currently
valid ITU-T Recommendations.
2.2 Identical Recommendations | International Standards
– None
2.3 Paired Recommendations | International Standards equivalent in technical content
– None
2.4 Additional references
– Recommendation ITU-T T.35 (in force), Procedure for the allocation of ITU-T defined codes for
non-standard facilities.
– ISO/IEC 11578: in force, Information technology — Open Systems Interconnection — Remote Procedure
Call (RPC).
– ISO 11664-1: in force, Colorimetry — Part 1: CIE standard colorimetric observers.
– ISO 12232: in force, Photography – Digital still cameras – Determination of exposure index, ISO speed
ratings, standard output sensitivity, and recommended exposure index.
– IETF RFC 1321 (in force), The MD5 Message-Digest Algorithm.
3 Definitions
For the purposes of this Recommendation | International Standard, the following definitions apply:
3.1 access unit: A set of NAL units that are associated with each other according to a specified classification rule,
are consecutive in decoding order, and contain exactly one coded picture.
NOTE – In addition to containing the VCL NAL units of the coded picture, an access unit may also contain non-
VCL NAL units. The decoding of an access unit always results in a decoded picture.
3.2 AC transform coefficient: Any transform coefficient for which the frequency index in at least one of the two
dimensions is non-zero.
3.3 associated non-VCL NAL unit: A non-VCL NAL unit (when present) for a VCL NAL unit where the VCL
NAL unit is the associated VCL NAL unit of the non-VCL NAL unit.
3.4 associated IRAP picture: The previous IRAP picture in decoding order (when present).
3.5 associated VCL NAL unit: The preceding VCL NAL unit in decoding order for a non-VCL NAL unit with
nal_unit_type equal to EOS_NUT, EOB_NUT, FD_NUT, or SUFFIX_SEI_NUT, or in the ranges of
RSV_NVCL45..RSV_NVCL47 or UNSPEC56..UNSPEC63; or otherwise the next VCL NAL unit in decoding
order.
3.6 bin: One bit of a bin string.
4 Rec. ITU-T H.265 (04/2013)
3.7 binarization: A set of bin strings for all possible values of a syntax element.
3.8 binarization process: A unique mapping process of all possible values of a syntax element onto a set of bin
strings.
3.9 bin string: An intermediate binary representation of values of syntax elements from the binarization of the
syntax element.
3.10 bi-predictive (B) slice: A slice that may be decoded using intra prediction or inter prediction using at most
two motion vectors and reference indices to predict the sample values of each block.
3.11 bitstream: A sequence of bits, in the form of a NAL unit stream or a byte stream, that forms the
representation of coded pictures and associated data forming one or more CVSs.
3.12 block: An MxN (M-column by N-row) array of samples, or an MxN array of transform coefficients.
3.13 broken link: A location in a bitstream at which it is indicated that some subsequent pictures in decoding
order may contain serious visual artefacts due to unspecified operations performed in the generation of the
bitstream.
3.14 broken link access (BLA) access unit: An access unit in which the coded picture is a BLA picture.
3.15 broken link access (BLA) picture: An IRAP picture for which each VCL NAL unit has nal_unit_type equal
to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.
NOTE – A BLA picture contains only I slices, and may be the first picture in the bitstream in decoding order, or
may appear later in the bitstream. Each BLA picture begins a new CVS, and has the same effect on the decoding
process as an IDR picture. However, a BLA picture contains syntax elements that specify a non-empty RPS. When a
BLA picture for which each VCL NAL unit has nal_unit_type equal to BLA_W_LP, it may have associated RASL
pictures, which are not output by the decoder and may not be decodable, as they may contain references to pictures
that are not present in the bitstream. When a BLA picture for which each VCL NAL unit has nal_unit_type equal to
BLA_W_LP, it may also have associated RADL pictures, which are specified to be decoded. When a BLA picture
for which each VCL NAL unit has nal_unit_type equal to BLA_W_RADL, it does not have associated RASL
pictures but may have associated RADL pictures. When a BLA picture for which each VCL NAL unit has
nal_unit_type equal to BLA_N_LP, it does not have any associated leading pictures.
3.16 buffering period: The set of access units starting with an access unit that contains a buffering period SEI
message and containing all subsequent access units in decoding order up to but not including the next access
unit (when present) that contains a buffering period SEI message.
3.17 byte: A sequence of 8 bits, within which, when written or read as a sequence of bit values, the left-most and
right-most bits represent the most and least significant bits, respectively.
3.18 byte-aligned: A position in a bitstream is byte-aligned when the position is an integer multiple of 8 bits from
the position of the first bit in the bitstream, and a bit or byte or syntax element is said to be byte-aligned when
the position at which it appears in a bitstream is byte-aligned.
3.19 byte stream: An encapsulation of a NAL unit stream containing start code prefixes and NAL units as specified
in Annex B.
3.20 can: A term used to refer to behaviour that is allowed, but not necessarily required.
3.21 chroma: An adjective, represented by the symbols Cb and Cr, specifying that a sample array or single sample
is representing one of the two colour difference signals related to the primary colours.
NOTE – The term chroma is used rather than the term chrominance in order to avoid the implication of the use of
linear light transfer characteristics that is often associated with the term chrominance.
3.22 clean random access (CRA) access unit: An access unit in which the coded picture is a CRA picture.
3.23 clean random access (CRA) picture: An IRAP picture for which each VCL NAL unit has nal_unit_type
equal to CRA_NUT.
NOTE – A CRA picture contains only I slices, and may be the first picture in the bitstream in decoding order, or
may appear later in the bitstream. A CRA picture may have associated RADL or RASL pictures. When a CRA
picture has NoRaslOutputFlag equal to 1, the associated RASL pictures are not output by the decoder, because they
may not be decodable, as they may contain references to pictures that are not present in the bitstream.
3.24 coded picture: A coded representation of a picture containing all coding tree units of the picture.
3.25 coded picture buffer (CPB): A first-in first-out buffer containing decoding units in decoding order specified
in the hypothetical reference decoder in Annex C.
3.26 coded representation: A data element as represented in its coded form.
Rec. ITU-T H.265 (04/2013) 5
3.27 coded slice segment NAL unit: A NAL unit that has nal_unit_type in the range of TRAIL_N to RASL_R,
inclusive, or in the range of BLA_W_LP to RSV_IRAP_VCL23, inclusive, which indicates that the NAL unit
contains a coded slice segment.
3.28 coded video sequence (CVS): A sequence of access units that consists, in decoding order, of an IRAP access
unit with NoRaslOutputFlag equal to 1, followed by zero or more access units that are not IRAP access units
with NoRaslOutputFlag equal to 1, including all subsequent access units up to but not including any
subsequent access unit that is an IRAP access unit with NoRaslOutputFlag equal to 1.
NOTE – An IRAP access unit may be an IDR access unit, a BLA access unit, or a CRA access unit. The value of
NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA access unit, and each CRA access unit that is
the first access unit in the bitstream in decoding order, is the first access unit that follows an end of sequence NAL
unit in decoding order, or has HandleCraAsBlaFlag equal to 1.
3.29 coding block: An NxN block of samples for some value of N such that the division of a coding tree block into
coding blocks is a partitioning.
3.30 coding tree block: An NxN block of samples for some value of N such that the division of a component into
coding tree blocks is a partitioning.
3.31 coding tree unit: A coding tree block of luma samples, two corresponding coding tree blocks of chroma
samples of a picture that has three sample arrays, or a coding tree block of samples of a monochrome picture
or a picture that is coded using three separate colour planes and syntax structures used to code the samples.
3.32 coding unit: A coding block of luma samples, two corresponding coding blocks of chroma samples of a
picture that has three sample arrays, or a coding block of samples of a monochrome picture or a picture that is
coded using three separate colour planes and syntax structures used to code the samples.
3.33 component: An array or single sample from one of the three arrays (luma and two chroma) that compose a
picture in 4:2:0, 4:2:2, or 4:4:4 colour format or the array or a single sample of the array that compose a
picture in monochrome format.
3.34 context variable: A variable specified for the adaptive binary arithmetic decoding process of a bin by an
equation containing recently decoded bins.
3.35
cropped decoded picture: The result of cropping a decoded picture based on the conformance cropping
window specified in the SPS that is referred to by the corresponding coded picture.
3.36 decoded picture: A decoded picture is derived by decoding a coded picture.
3.37 decoded picture buffer (DPB): A buffer holding decoded pictures for reference, output reordering, or output
delay specified for the hypothetical reference decoder in Annex C.
3.38 decoder: An embodiment of a decoding process.
3.39 decoder under test (DUT): A decoder that is tested for conformance to this Specification by operating the
hypothetical stream scheduler to deliver a conforming bitstream to the decoder and to the hypothetical
reference decoder and comparing the values and timing or order of the output of the two decoders.
3.40 decoding order: The order in which syntax elements are processed by the decoding process.
3.41 decoding process: The process specified in this Specification that reads a bitstream and derives decoded
pictures from it.
3.42 decoding unit: An access unit if SubPicHrdFlag is equal to 0 or a subset of an access unit otherwise,
consisting of one or more VCL NAL units in an access unit and the associated non-VCL NAL units.
3.43 dependent slice segment: A slice segment for which the values of some syntax elements of the slice segment
header are inferred from the values for the preceding independent slice segment in decoding order.
3.44 display process: A process not specified in this Specification having, as its input, the
cropped decoded
pictures that are the output of the decoding process.
3.45 elementary stream: A sequence of one or more bitstreams.
NOTE – An elementary stream that consists of two or more bitstreams would typically have been formed by splicing
together two or more bitstreams (or parts thereof).
3.46 emulation prevention byte: A byte equal to 0x03 that is present within a NAL unit when the syntax elements
of the bitstream form certain patterns of byte values in a manner that ensures that no sequence of consecutive
byte-aligned bytes in the NAL unit can contain a start code prefix.
3.47 encoder: An embodiment of an encoding process.
6 Rec. ITU-T H.265 (04/2013)
3.48 encoding process: A process not specified in this Specification that produces a bitstream conforming to this
Specification.
3.49 field: An assembly of alternative rows of samples of a frame.
3.50 filler data NAL units: NAL units with nal_unit_type equal to FD_NUT.
3.51 flag: A variable that can take one of the two possible values 0 and 1.
3.52 frame: The composition of a top field and a bottom field, where sample rows 0, 2, 4, ... originate from the top
field and sample rows 1, 3, 5, ... originate from the bottom field.
3.53 frequency index: A one-dimensional or two-dimensional index associated with a transform coefficient prior
to an inverse transform part of the decoding process.
3.54 hypothetical reference decoder (HRD): A hypothetical decoder model that specifies constraints on the
variability of conforming NAL unit streams or conforming byte streams that an encoding process may
produce.
3.55 hypothetical stream scheduler (HSS): A hypothetical delivery mechanism used for checking the
conformance of a bitstream or a decoder with regards to the timing and data flow of the input of a bitstream
into the hypothetical reference decoder.
3.56 independent slice segment: A slice segment for which the values of the syntax elements of the slice segment
header are not inferred from the values for a preceding slice segment.
3.57 informative: A term used to refer to content provided in this Specification that does not establish any
mandatory requirements for conformance to this Specification and thus is not considered an integral part of
this Specification.
3.58 instantaneous decoding refresh (IDR) access unit: An access unit in which the coded picture is an
IDR
picture.
3.59 instantaneous decoding refresh (IDR) picture: An IRAP picture for which each VCL NAL unit has
nal_unit_type equal to IDR_W_RADL or IDR_N_LP.
NOTE – An IDR picture contains only I slices, and may be the first picture in the bitstream in decoding order, or
may appear later in the bitstream. Each IDR picture is the first picture of a CVS in decoding order. When an IDR
picture for which each VCL NAL unit has nal_unit_type equal to IDR_W_RADL, it may have associated RADL
pictures. When an IDR picture for which each VCL NAL unit has nal_unit_type equal to IDR_N_LP, it does not
have any associated leading pictures. An IDR picture does not have associated RASL pictures.
3.60 inter coding: Coding of a coding block, slice, or picture that uses inter prediction.
3.61 inter prediction: A prediction derived in a manner that is dependent on data elements (e.g., sample values or
motion vectors) of pictures other than the current picture.
3.62 intra coding: Coding of a coding block, slice, or picture that uses intra prediction.
3.63 intra prediction: A prediction derived from only data elements (e.g., sample values) of the same decoded
slice.
3.64 intra random access point (IRAP) access unit: An access unit in which the coded picture is an IRAP
picture.
3.65 intra random access point (IRAP) picture: A coded picture for which each VCL NAL unit has nal_unit_type
in the range of BLA_W_LP to RSV_IRAP_VCL23, inclusive.
NOTE – An IRAP picture contains only I slices, and may be a BLA picture, a CRA picture or an IDR picture. The
first picture in the bitstream in decoding order must be an IRAP picture. Provided the necessary parameter sets are
available when they need to be activated, the IRAP picture and all subsequent non-RASL pictures in decoding order
can be correctly decoded without performing the decoding process of any pictures that precede the IRAP picture in
decoding order. There may be pictures in a bitstream that contain only I slices that are not IRAP pictures.
3.66 intra (I) slice: A slice that is decoded using intra prediction only.
3.67 inverse transform: A part of the decoding process by which a set of transform coefficients are converted into
spatial-domain values.
3.68 layer: A set of VCL NAL units that all have a particular value of nuh_layer_id and the associated non-VCL
NAL units, or one of a set of syntactical structures having a hierarchical relationship.
NOTE – Depending on the context, either the first layer concept or the second layer concept applies. The first layer
concept is also referred to as a scalable layer, wherein a layer may be a spatial scalable layer, a quality scalable
layer, a view, etc. A temporal true subset of a scalable layer is not referred to as a layer but referred to as a sub-layer
剩余316页未读,继续阅读
2012-10-30 上传
2021-09-02 上传
2021-09-02 上传
2023-09-15 上传
2024-02-08 上传
2023-05-03 上传
2024-04-14 上传
2023-09-15 上传
2024-01-23 上传
caothreesunscao
- 粉丝: 0
- 资源: 11
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 天池大数据比赛:伪造人脸图像检测技术
- ADS1118数据手册中英文版合集
- Laravel 4/5包增强Eloquent模型本地化功能
- UCOSII 2.91版成功移植至STM8L平台
- 蓝色细线风格的PPT鱼骨图设计
- 基于Python的抖音舆情数据可视化分析系统
- C语言双人版游戏设计:别踩白块儿
- 创新色彩搭配的PPT鱼骨图设计展示
- SPICE公共代码库:综合资源管理
- 大气蓝灰配色PPT鱼骨图设计技巧
- 绿色风格四原因分析PPT鱼骨图设计
- 恺撒密码:古老而经典的替换加密技术解析
- C语言超市管理系统课程设计详细解析
- 深入分析:黑色因素的PPT鱼骨图应用
- 创新彩色圆点PPT鱼骨图制作与分析
- C语言课程设计:吃逗游戏源码分享
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功