ISO/IEC 14496-15: 视频编码标准与ISO基媒体文件格式

5星 · 超过95%的资源需积分: 48 105 浏览量更新于2024-07-19 2 收藏 657KB PDF 举报

"ISO/IEC 14496-15是国际标准化组织（ISO）和国际电工委员会（IEC）联合制定的一份标准，它定义了如何在ISO基础媒体文件格式中承载网络访问层（NAL）单元结构化的视频数据。这份标准是音频-视觉对象编码的一部分，特别是针对高级视频编码（Advanced Video Coding, AVC），也被称为H.264标准。该文档的第三版发布于2013年，旨在提供一种标准化的方法来存储和传输高效的视频流。" ISO/IEC 14496-15标准详细描述了如何在ISO基础媒体文件格式（例如MP4或MOV文件）中封装H.264编码的视频数据。这个标准对于多媒体内容的分发、播放和存储至关重要，因为它确保了不同平台和设备之间的一致性和兼容性。在这个标准中，NAL单元是H.264编码的核心组成部分，它们包含了视频帧的压缩数据。NAL单元的结构化允许高效的网络传输和存储，因为它们可以独立地被解析和处理。标准规定了如何将这些NAL单元嵌入到文件格式中，同时保持其完整性和可解码性。 ISO基础媒体文件格式是一种容器格式，能够包含音频、视频和其他元数据。通过使用ISO/IEC 14496-15，该格式能够有效地支持H.264编码的视频，这在高清视频和移动设备上尤其重要，因为H.264提供了高视频质量与低带宽需求之间的良好平衡。文档的第三版可能包括了一些更新和改进，以适应技术的发展，比如增强的错误恢复机制、更高效的比特流处理或者对新设备和网络环境的支持。尽管ISO/IEC 14496-15标准本身并不直接涉及视频编码算法，但它确保了编码后的数据在各种应用场景中的正确传输和呈现。 ISO/IEC 14496-15标准对于理解如何在实际应用中处理和操作H.264编码的视频文件至关重要，无论是内容创作者、开发者还是系统集成商，都需要熟悉这一标准，以确保他们的产品和服务能够有效地处理和播放高质量的视频内容。

ISO/IEC 14496-15:2013(E)

ICS 35.040

Price based on 89 pages

4.6 Visual width and height

The width and height fields in a VisualSampleEntry must correctly document the cropped frame

dimensions (visual presentation size) of the video stream that is described by that entry. The width and height

fields do not reflect any changes in size caused by SEI messages such as pan-scan. The visual handling of

SEI messages such as pan-scan is both optional and terminal-dependent. If the width and height of the

sequence changes, then a new sample entry is needed.

Note that the visual size in the SPS may be either frame or field size; in the sample entry, it is always the

frame size.

The width and height fields in the track header may not be the same as the width and height fields in the one

or more VisualSampleEntry in the video track. As specified in the ISO Base Media File Format, if

normalized visual presentation is needed, all the sequences are normalized to the track width and height for

presentation.

4.7 Decoding time (DTS) and composition time (CTS)

Samples are stored in the file format in decoding order. If picture reordering is not used and decoding and

composition times are the same, then presentation is the same as decoding order and only the time-to-sample

‘stts’ table is used. Note that any kind of picture may be reordered, not only B-pictures.

If decoding time and composition time differ, the composition time-to-sample ‘ctts’ table is also used in

conjunction with the 'stts' table.

4.8 Sync sample (IDR)

A sample is considered as a sync sample if ALL of the following conditions are met:

• The video data NAL units in the sample indicate that the primary picture contained in the sample is an

instantaneous decoding refresh (IDR) picture.

• When the sample entry name is 'avc1' or 'avc2', all SPSs and PPSs needed to decode the video data

NAL units in the sample of the IDR picture and the following samples in decode order are contained in

the decoder configuration of the video elementary stream or in a separate parameter set elementary

stream sample.

• When the sample entry name is 'avc3' or 'avc4', the following applies:

1. If the sample is an IDR access unit, all parameter sets needed for decoding that sample shall be

included either in the sample entry or in the sample itself.

2. Otherwise (the sample is not an IDR access unit), all parameter sets needed for decoding the

sample shall be included either in the sample entry or in any of the samples since the previous

random access point to the sample itself, inclusive.

A parameter set elementary stream sample is a sync sample if and only if all parameter sets required by the

associated video elementary stream from the time of the parameter set sample forward are supplied, in the

parameter set stream, before they are required by the associated video elementary stream.

4.9 Shadow sync

The use of the shadow sync table to indicate alternate encodings of a sample for random access are

supported as defined in the ISO Base Media File Format. A shadow sync shall indicate a sample that is a

random access point as specified in the general requirements and for the specific coding format in the track.

ISO/IEC 14496-15:2013(E)

While the use of shadow sync is supported for backward compatibility reasons, this use is deprecated and use

of the mechanisms defined in 5.4.6 is recommended.

4.10 Sample groups on random access recovery points and random access points

The video coding system can include the concept of a ‘gradual decoding refresh’ or random access recovery

point. This may be signalled in the bit-stream using a mechanism such as the recovery point SEI message.

This message is found at the beginning of the random access, and indicates how much data must be decoded

subsequent to the access unit at the position of the SEI message before the recovery is complete.

When all access units in output order starting from the access unit at the position of the SEI message can be

successfully decoded after random access, i.e. when the recovery_frame_cnt syntax element of the recovery

point SEI message is 0, the Random Access Point (‘rap ‘) sample grouping should be used.

This concept of gradual recovery is supported in the file format also by using RollRecoveryEntry Groups [4.5].

In order that the group membership marks the sample containing the SEI message the ‘roll-distance’ is

constrained to being only positive (i.e. a post-roll). In other words, RollRecoveryEntry Groups can be used

when the value of the recovery_frame_cnt syntax element of the recovery point SEI message is greater than 0.

Note – The roll-group counts samples in the file format; this may not match the way that the distances

are represented in the SEI message.

Within a stream, it is necessary to mark the beginning of the pre-roll, so that a stream decoder may start

decoding there. However, in a file, when performing random access, a deterministic search is desired for the

closest preceding frame which can be decoded perfectly (either a sync sample, or the end of a pre-roll).

4.11 Hinting

Note that what the hint tracks call “B frames” are actually ‘disposable’ pictures or non-reference pictures, for

example as defined in ISO/IEC 14496-10.

Care should be taken when the structures in Annex A (aggregators or extractors) are in use and the track is

hinted. These structures are defined only for use in the file format and should not be transmitted. In particular,

a hint track that points at an extractor in a video track would cause the extractor itself to be transmitted (which

is probably both incorrect and not the desired behaviour), not the data the extractor references. Hint tracks

should normally directly reference NAL units specified in the applicable video coding standard.

5 AVC elementary streams and sample definitions

5.1 Introduction

The Advanced Video Coding (AVC) standard, jointly developed by the ITU-T and

ISO/IEC JTC 1/SC 29/WG 11 (MPEG), offers not only increased coding efficiency and enhanced robustness,

but also many features for the systems that use it. To enable the best visibility of, and access to, those

features, and to enhance the opportunities for the interchange and interoperability of media, this part of

ISO/IEC 14496 defines a storage format for video streams compressed using AVC.

This clause defines the storage for plain AVC streams, where ‘plain AVC’ refers to the main part of

ISO/IEC 14496-10, excluding Annex G (Scalable Video Coding) and Annex H (Multiview Video Coding).

This clause specifies the elementary stream and sample structure used to store AVC visual content.

The storage of AVC content uses the existing capabilities of the ISO base media file format but also defines

extensions to support the following features of the AVC codec.

 Switching pictures:

to enable switching between different coded streams and substitution of pictures within the same stream.

ISO/IEC 14496-15:2013(E)

ICS 35.040

Price based on 89 pages

 Sub-sequences and layers:

provides a structuring of the dependencies of a group of pictures to provide for a flexible stream structure

(e.g. in terms of temporal scalability and layering).

 Parameter sets:

the sequence and picture parameter set mechanism decouples the transmission of infrequently changing

information from the transmission of coded macroblock data. Each slice containing the coded macroblock

data references the picture parameter set containing its decoding parameters. In turn, the picture

parameter set references a sequence parameter set that contains sequence level decoding parameter

information.

5.2 Elementary stream structure

Two types of elementary streams are defined for storing AVC content (see also Figure 2):

• Video Elementary Streams shall contain all video coding related NAL units (i.e. those NAL units

containing video data or signaling video structure) and may contain non-video coding related NAL units

such as SEI messages and access unit delimiter NAL units. Other NAL units that are not expressly

prohibited may be present, and if they are unrecognized should be ignored (e.g. not placed in the

output buffer while accessing the file).

• Parameter set elementary streams shall not contain video coding related NAL units (i.e. those NAL

units containing video data or signalling video structure), and would normally contain only sequence

parameter sets, picture parameter sets and sequence parameter set extension NAL units.

Using these stream types, AVC content shall be stored in one of these configurations:

• Video elementary stream with no parameter sets: In this case, sequence and picture parameter set

NAL units shall be stored in the sample entries of this track. Sequence and picture parameter set NAL

units shall not be part of AVC samples within the stream itself.

• Video elementary stream possibly including parameter sets: In this case, the sample entry

indicates whether the stream may contain parameter sets of given types, in addition to other

parameters provided in the sample entry. Sequence and picture parameter set NAL units may

therefore be part of AVC samples within the stream itself.

• Video elementary stream and parameter set elementary stream: In this case, sequence and

picture parameter set NAL units shall be transmitted only in the parameter set elementary stream and

shall neither be present in the sample entries nor the AVC samples of the video elementary stream.

The types of NAL units that are allowed in each of the video and parameter set elementary streams are

specified in the following table.

ISO/IEC 14496-15:2013(E)

xiii

Table 2 – NAL Unit types in elementary Streams

Value of

nal_unit_type

Description Video elementary

stream (sample entry

'avc1' or 'avc2')

Video elementary

stream (sample entry

'avc3' or 'avc4')

Parameter set

elementary

stream

Unspecified

Not specified by this

part of ISO/IEC 14496

Not specified by this

part of ISO/IEC 14496

Not specified

by this part of

ISO/IEC 14496

1 Coded slice of a non-IDR

picture

slice_layer_without_partitionin

g_rbsp( )

Yes

Yes No

2 Coded slice data partition A

slice_data_partition_a_layer_r

bsp( )

Yes

Yes No

3 Coded slice data partition B

slice_data_partition_b_layer_r

bsp( )

Yes

Yes No

4 Coded slice data partition C

slice_data_partition_c_layer_r

bsp( )

Yes

Yes No

5 Coded slice of an IDR picture

slice_layer_without_partitionin

g_rbsp( )

Yes

Yes No

6 Supplemental enhancement

information(SEI)

sei_rbsp( )

Yes.

Except for the Sub-

sequence, layering or

Filler SEI messages

Yes

Except for the Sub-

sequence, or layering

SEI messages

Only

‘declarative’

SEIs should be

present

7 Sequence parameter set

(SPS)

seq_parameter_set_rbsp( )

No.

If parameter set

elementary stream is

not used, SPS shall be

stored in the Decoder

Specific Information.

Yes

Parameter set

elementary stream

shall not be used

Yes

8 Picture parameter set (PPS)

pic_parameter_set_rbsp( )

No.

If parameter set

elementary stream is

not used, PPS shall be

stored in the Decoder

Specific Information.

Yes

Parameter set

elementary stream

shall not be used

Yes

9 Access unit delimiter (AU

Delimiter)

access_unit_delimiter_rbsp( )

Yes

Yes No

10 End of sequence

end_of_seq_rbsp()

Yes

Yes No

11 End of stream

end_of_stream_rbsp()

Yes

Yes No

12 Filler data (FD)

filler_data_rbsp( )

Yes No

13 Sequence parameter set

extension

seq_parameter_set_extensio

n_rbsp( )

No.

If parameter set

elementary stream is

not used, Sequence

Parameter Set

Extension shall be

stored in the Decoder

Yes

Parameter set

elementary stream

shall not be used

Yes

剩余117页未读，继续阅读

tianqishi

粉丝: 43
资源: 23

ISO/IEC 14496-15: 视频编码标准与ISO基媒体文件格式

ISO-IEC14496-15-2017

ISO-IEC 14496-15 2014

ISO_IEC_14496-14-2020.new.pdf

ISO_IEC_14496-15-2004.rar_IEC_ISO/IEC 14496-15_ISO_IEC_14496_ISO

ISO/IEC14496-15-2014，压缩卷(1/2)

ISO/IEC 14496-15标准文档解析

掌握MPEG-4标准：ISO/IEC 14496-15深入解析

ISO/IEC 14496-15:2010 - AVC标准：高级视频编码文件格式

ISO/IEC 14496-15: AVC标准详解——高级视频编码文件格式

iso/iec 14496-15:2017

最新资源