ETSI
ETSI ES 201 980 V4.1.1 (2014
• UEP shall not be used with xHE-AAC.
5.1.2 AAC audio coding
For generic audio coding, a subset of the MPEG-4 Advanced Audio Coding (AAC) toolbox chosen to best suit the
DRM system environment is used. For example a standard configuration for use in one short wave channel could be
20 kbit/s mono.
Specific features of the AAC stream within the DRM system are:
• Bit rate: AAC can be used at any bit rate. The granularity of the AAC bit rate is 20 bit/s for robustness modes
A, B, C and D and 80 bit/s for robustness mode E.
• Sampling rates: permitted sampling rates are 12 kHz and 24 kHz for robustness modes A, B, C and D and
24 kHz and 48 kHz for robustness mode E. 48 kHz is only permitted if the SBR tool is not used.
• Transform length: the transform length is 960 to ensure that one audio frame corresponds to 80 ms or 40 ms
(robustness modes A, B, C and D) or to 40 ms or 20 ms (robustness mode E) in time. This is required to allow
the combination of an integer number of audio frames to build an audio super frame of 400 ms (robustness
modes A, B, C and D) or 200 ms (robustness mode E) duration.
• Error robustness: a subset of MPEG-4 tools is used to improve the AAC bit stream error robustness in error
prone channels (the MPEG-4 EP tool is not used).
• Audio super framing: 5 or 10 audio frames are composed into one audio super frame. For robustness modes A,
B, C and D, the respective sampling rates are 12 kHz and 24 kHz producing an audio super frame of 400 ms
duration; for robustness mode E, the respective sampling rates are 24 kHz and 48 kHz producing an audio
super frame of 200 ms duration. The audio frames in the audio super frames are encoded together such that
each audio super frame is of constant length, i.e. that bit exchange between audio frames is only possible
within an audio super frame. One audio super frame is always placed in one logical frame in robustness modes
A, B, C and D and in two logical frames in robustness mode E (see clause 6). In this way no additional
synchronization is needed for the audio coding. Retrieval of frame boundaries and provisions for UEP are also
taken care of within the audio super frame.
• UEP: better graceful degradation and better operation at higher BERs is achieved by applying UEP to the
AAC bit stream. Unequal error protection is realized via the multiplex/coding units. For robustness mode E,
the length of the higher protected part of an audio super frame shall be a multiple of 2 bytes.
SBR coding
To maintain a reasonable perceived audio quality at low bit rates, classical audio or speech source coding algorithms
need to limit the audio bandwidth and to operate at low sampling rates. It is desirable to be able to offer high audio
bandwidth also in very low bit rate environments. This can be realized by the use of Spectral Band Replication (SBR).
The purpose of SBR is to recreate the missing high frequency band of the audio signal that could not be coded by the
encoder. In order to do this in the best possible way, some side information needs to be transmitted in the audio
bitstream, removing a small percentage of the available data rate from the audio coder. This side information is
computed on the full bandwidth signal, prior to encoding and aids the reconstruction of the high frequencies after
audio/speech decoding.
SBR exists in two versions. The version difference is only reflected in the decoder design. High Quality SBR uses a
complex filterbank whereas Low Power SBR uses a real-valued filterbank plus anti-aliasing modules. The Low Power
version of SBR offers a significant reduction in complexity as compared to the High Quality version without
compromising too much on audio quality. AAC + SBR is defined in MPEG-4 Audio (High Efficiency AAC profile) [2].
PS coding
For improved performance at low bit rate stereo coding, a Parametric Stereo (PS) coder is available. The PS tool can be
used when running the configuration AAC + SBR (MPEG High Efficiency AAC profile). The general idea with PS
coding is to send stereo image describing data as side information along with a downmixed mono signal. This stereo
side information is very concise and only requires a small fraction of the total bit rate allowing the mono signal to have
maximum quality for the total bit rate given.