MATLAB解析MPEG-1 Layer III (MP3)编码算法详解

5星 · 超过95%的资源需积分: 9 165 浏览量更新于2024-07-26 1 收藏 2.58MB PDF 举报

本篇论文《Analysis of the MPEG-1 Layer III (MP3) Algorithm Using MATLAB》由Jayaraman J. Thiagarajan 和 Andreas Spanias合作撰写，收录于《Synthesis Lectures on Algorithms and Software in Engineering》系列。该研究主要聚焦于MPEG-1 Layer III（即MP3）编码算法的MATLAB分析。MP3是一种广泛应用于音频压缩标准，它利用频域离散余弦变换（DCT）和熵编码技术来高效地压缩音频数据，显著减小存储空间和传输带宽需求。在文章中，作者深入探讨了MP3算法的关键组成部分，包括信号预处理、频率域变换、量化和熵编码等步骤。通过MATLAB这个强大的工具，他们展示了如何实现这些核心算法，并可能对算法性能进行了仿真和评估。MATLAB在这里的作用是作为实验平台，其可视化和编程能力使得复杂的技术概念变得直观易懂。同时，论文可能还提及了与其他音频处理技术的比较，比如Gaussian Quadrature Methods的应用，它们在信号分析和处理中的作用，以及与语音预测和感知模型（如Speech Predictive and Perceptual Modeling）的相关性。此外，文中可能会涉及Code Excited Linear Prediction (CELP)算法，这是一种常用的语音编码方法，与MP3算法并列为音频压缩领域的重要技术。论文可能还讨论了MPEG-1 Layer III在无线通信中的应用，如在OFDM（正交频分复用）系统中的集成，以及其在适应性高分辨率传感器信号设计中的角色，这对于跟踪和定位技术具有重要意义。为了更好地理解和优化MP3算法，作者可能还研究了盲信号分离算法（Blind Signal Separation Algorithms）的理论进展，这些算法对于音频信号的混合分离也是音频处理领域的重要课题。同时，论文还可能关注了近年来在Waveform-Agile Sensing方面的研究，这是现代信号处理中提高跟踪性能的一种新兴技术。这篇论文不仅提供了对MP3编码算法的深入剖析，还展示了MATLAB在实际工程应用中的实用价值，以及与其他相关领域的交叉融合，为音频压缩、信号处理和通信技术的研究者提供了宝贵的参考资料。

2 1. INTRODUCTION

(Moving Pictures Experts Group) audio standards, i.e., MPEG-1 [27] and the MPEG-2 [28]. Fur-

thermore,several successful commercial audio standards have been published including Sony’s Adap-

tive TRansform Acoustic Coding (ATRAC), DTS Coherent Acoustics (DTS-CA) and Dolby’s

Audio Coder-3 (AC-3). Elements or entire algorithms for perceptual coding have also appeared

in [21, 23],[27, 28, 29,30, 31, 32,33, 34, 35,36, 37, 38,39, 41, 42,44, 45,46,47, 48,49,50, 51,52, 53].

With the emergence of surround sound systems, multi-channel encoding formats also gained inter-

est [54, 55, 56].The advent of ISO/IEC MPEG-4 standardization [45, 47] established new research

goals for high-quality coding of general audio signals even at low bit rates. MPEG-4 audio encom-

passes an integrated family of algorithms with wide ranging provisions for scalable, object-based

speech and audio coding at bit rates from 200 bps up to 64 kbps per channel [57, 58].

1.1.1 RECENT AUDIO CODECS

The older MPEG-1 hybrid audio coding technique (ISO/IEC 11172-3) incorporates subband ﬁlter

bank decomposition, signal transforms such as the FFT and psychoacoustic analysis. MPEG-1 audio

operates on 16-bit PCM input audio data and accommodates sample rates of 32, 44.1, and 48 kHz.

Operating modes of this algorithm include mono, stereo, dual independent mono, and joint stereo.

The target bit rates are programmable in the range of 32-192 kbits/s for mono and 64-384 kbits/s

for stereo. Despite the fact that MPEG-1 Layer-III (MP3) is still an active and popular standard,

several new algorithms have been shown to perform better. Advanced Audio Coding (AAC) is a

standardized, lossy compression scheme that generally achieves better sound quality than MP3 at

similar bit rates.It has been standardized by the ISO and IEC as part of the MPEG-2 and MPEG-4

standards. Designed as a successor to the MP3 algorithm, AAC allows more sampling frequencies

(8 kHz to 96 kHz) and supports up to 48 channels.

Though perceptual audio coders such as the MP3 and AAC offer reasonably good quality

at bit rates down to 80 kbps, they are associated with an algorithmic delay that exceeds 120 ms.

Applications such as two-way communications or broadcasting require low end-to-end delays of

the order of 20 ms. As a result, Low Delay (LD) audio coding schemes have been developed and

they provide comparable perceptual quality to MP3 or AAC with a very low algorithmic delay. The

MPEG-4 AAC audio coder is used as a basis to build the low delay functionality preferable in

end-to-end applications such as teleconferencing and telephony. Typical bit rates of AAC-LD start

at 32 kbps for a mono signal with 22 kHz sampling rate and reach 128 kbps providing excellent

audio quality [59]. AAC-ELD (Enhanced Low Delay) was standardized as part of MPEG in

January 2008. AAC-ELD has an algorithmic delay of 32 ms at 24 kbps down to 15 ms at 64 kbps.

AAC-ELD combines the advantages of AAC-LD for low encoding/decoding purposes and Spectral

Band Replication (SBR) for preserving high quality at low bit rates. Delay critical applications such

as wideband audio/video conferencing, broadcasting which require high quality audio at low bit

rates can beneﬁt from this scheme [60]. The Ultra Low Delay (ULD) AAC [61] was developed at

Fraunhofer and attains delays of the order of 8 ms.

1.1. A BRIEF HISTORY OF AUDIO CODERS 3

The need for an interface to exchange multimedia content through the internet resulted

in the development of the MPEG-7 audio standard [48]. MPEG-7 supports a broad range of

applications [62] that include the following: multimedia indexing/searching, multimedia editing,

broadcast media selection, and multimedia digital library sorting. Issues such as the“interoperability”

and multimedia resource delivery over a wide range of networks and terminals motivated the MPEG-

21 Framework [53].

As mentioned earlier, Adaptive Transform Acoustic Coding (ATRAC) is a family of audio

compression algorithms developed by Sony. Though the initial versions of ATRAC were used with

the MiniDisc in the early 1990s, today the recent advanced ATRAC algorithms are used in several

Sony-branded audio players, the Real Audio 8 and the native audio compression format for audio

rendering in PS3 [63].The MPEG-4 parametric audio codec, called Harmonic and Individual Lines

plus Noise (HILN), enables coding of general audio signals at bitrates as low as 4 kbit/s using a

parametric representation [64]. The encoder assumes that the audio signals can be synthesized using

only sinusoids and noise. The input signal is decomposed into components based on appropriate

source models and represented by model parameters. This approach utilizes more advanced source

modeling than just assuming a stationary signal for the duration of a frame.

The launch of storage formats (in 1999) such as the DVD-Audio and the Super Audio CD

(SACD) provided the audio codec designers with enormous storage capacity.This motivated an effort

for lossless coding of digital audio [46, 51, 65]. A lossless audio coding system is able to reconstruct

perfectly a “bit-for-bit representation” of the original input audio from the coded bitstream. In

contrast, a coding scheme incapable of perfect reconstruction from the coded representation is

called lossy. Several commercially successful lossless codecs have been developed in the last decade.

Some of the earliest lossless audio coders include the Apple Lossless Audio Codec (ALAC) [66]

and the Windows Media Audio 9 (WMA 9) lossless codec [13]. ALAC is an audio codec developed

by Apple Inc. for lossless data compression of digital music. Typically, it is stored within an MP4

container with the ﬁlename extension .m4a. Though this extension is also used by AAC, ALAC

employs linear prediction similar to other lossless codecs. All current iPod and iPhone devices can

play Apple Lossless-encoded ﬁles. The WMA 9 lossless codec was released by Microsoft in early

2003 and it supports up to 96 kHz, 24-bit, 5.1 discrete channels with full dynamic range compression

control. It can compress this multichannel signal audio CD at bit rates of 470 to 940 kbit/s.

Dolby TrueHD is an advanced lossless multi-channel audio codec developed by Dolby Lab-

oratories [67]. It is primarily intended for high-deﬁnition home-entertainment equipment such as

the Blu-ray Disc and the HD DVD.Though Dolby TrueHD is based on Meridian Lossless Packing

(MLP) [46], it is signiﬁcantly different from DVD-Audio. This variable bit-rate codec can support

up to 14 discrete sound channels in its bitstream. Another important audio codec is DTS-HD

Master Audio, developed by Digital Theater System [11]. It is an optional audio format for the

Blu-ray Disc format exclusively. This format aims to allow a bit-to-bit representation of the original

movie’s studio master soundtrack. To accomplish this, DTS-HD MA supports variable bit rates up

to 24.5 Mbit/s on a Blu-ray Disc and up to 18.0 Mbit/s for HD DVD.The DTS-HD Master Audio

4 1. INTRODUCTION

contains 2 data streams: the original DTS core stream and the residual stream which contains the

difference between the original signal and the lossy compression DTS core stream [68]. The resid-

ual data is then encoded by a lossless encoder and packed together with the core. The most recent

version of the Real player also supports lossless coding. The RealAudio lossless codec is designed

primarily for high-quality music downloads in mono or two-channel stereo format (multichannel

output is not supported). It replicates CD-quality sound in a format that takes less time for the user

to download. Although the lossless audio codec is designed for high-ﬁdelity music downloads, it

can also be used for broadcasts in high-bandwidth environments.

The MPEG-4 Audio Lossless Coding, also referred as MPEG-4 ALS [12], extends the

MPEG-4 Part 3 audio standard to perform lossless audio compression. It comprises of a short-term

predictor, which is a quantized LPC predictor with a lossless residual, and a long term predictor

modeled by 5 long-term weighted residues, each with its own delay.The long term predictor improves

the compression for sounds with rich harmonics found in several musical instruments and human

voice.

1.2 A GENERAL PERCEPTUAL AUDIO CODING

ARCHITECTURE

It is important to note the architectural similarities that characterize most perceptual audio coders

before we describe the MP3 audio codec in the following chapters.Over the last few years,researchers

have proposed several efﬁcient signal models and compression standards/methodologies for high-

quality digital audio reproduction. Most of these algorithms are based on the generic architecture

shown in Figure 1.1.Most coders typically segment input signals into quasi-stationary frames ranging

from 2 to 50 ms in duration. This is followed by a time-frequency analysis to estimate the temporal

and spectral components of each frame. Often, the time-frequency mapping is matched to the

analysis properties of the human auditory system, although this is not always the case. The objective

is to extract a set of time-frequency parameters that can be efﬁciently coded based on perceptual

criteria. The time-frequency analysis module can typically comprise of time-invariant or time-

varying ﬁlterbanks, harmonic analyzers and hybrid transforms.

The choice of time-frequency analysis methodology always involves a fundamental tradeoff

between time and frequency resolution requirements.The time-frequency analysis module employed

in the MPEG-1 codec is described in Chapter 2 and the strategies to handle the different resolution

requirements are presented in Chapter 4.Perceptual distortion control is achieved by a psychoacoustic

signal analysis module that estimates signal masking power based on psychoacoustic principles. The

psychoacoustic model quantiﬁes the maximum amount of distortion at each point in the time-

frequency plane such that quantization of the time-frequency parameters does not introduce audible

artifacts. The steps involved in the estimation of the masking thresholds using the psychoacoustic

model – II are explained in Section 3.2 of this book. The quantization and encoding module can

also exploit statistical redundancies through classical techniques such as DPCM or ADPCM. The

redundancies in the quantized parameters can be removed using run-length and entropy coding

1.3. PRINCIPLES OF PSYCHOACOUSTICS 5

Time-

Frequency

Analysis

Quantization

and

Encoding

Psychoacoustic

Analysis

Bit-allocation

Entropy

(Lossless)

coding

MUX

Input

audio

Masking

Thresholds

Parameters

Side

information

Figure 1.1: A generic block diagram of a perceptual audio encoder.

strategies [69, 70, 71]. Since the psychoacoustic module is signal dependent, most audio coding

algorithms are variable rate. However, ﬁxed channel rates can be achieved by efﬁcient management

of bit allocation using buffer feedback schemes. The coding methodology and the bit management

techniques employed in the MP3 algorithm to achieve a ﬁxed average bit rate are discussed in

Sections 5.4 and 5.5, respectively.

1.3 PRINCIPLES OF PSYCHOACOUSTICS

Audio coding algorithms rely on generalized models of human hearing to optimize coding efﬁciency.

The receiver is ultimately the human ear, and sound perception is affected by its masking properties.

The ﬁeld of psychoacoustics has made signiﬁcant progress toward characterizing the time-frequency

analysis capabilities of the inner ear. This, in turn, enabled audio coders to achieve compression

by exploiting “irrelevant” information that is not detectable by even a trained listener. Irrelevant

information is identiﬁed by incorporating psychoacoustic principles in quantization rules, including

critical band frequency analysis and masking. The psychoacoustic model described in Chapter 3

relies on the principles discussed in this section. The Sound Pressure Level (SPL) is a standard

metric that quantiﬁes the intensity of an acoustic stimulus. The SPL provides the level (intensity)

of sound pressure in decibels (dB) relative to an internationally deﬁned reference level, i.e., L

SPL

20 log

(p/p

), where L

SPL

is the SPL of a stimulus, p is the sound pressure of the stimulus in

Pascals (Pa - equivalent to Newton/m

), and p

is the standard reference level of 20μP a. Loosely

speaking, about 150 dB SPL spans the dynamic range of the auditory system; an SPL reference of

a quiet environment is around 0 dB SPL while a stimulus of 140 dB SPL approaches the threshold

of pain. The absolute threshold of hearing shown in Figure 1.2 characterizes the amount of energy

needed in a pure tone such that it can be detected by a listener in a noiseless environment.

The curve for the absolute threshold of hearing alone cannot be used for audio coding. Typi-

cally, music records require spectrally complex quantization rules and hence one has to modify the

剩余128页未读，继续阅读

liujb861213

粉丝: 59
资源: 28

MATLAB解析MPEG-1 Layer III (MP3)编码算法详解

MPEG1 Layer 3 原理介绍

ECG-ML-DL-Algorithm-Matlab-version

31235592-Teaching-Genetic-Algorithm-Using-Matlab._matlab例程_PDF_

A MATLAB Approach to study different type of Ziegler-Nichols PID Controller Tuning Algorithm:A MATLAB Approach to study different type of Ziegler-Nichols PID Controller Tuning Algorithm-matlab开发

matlab说话代码-Speaker-voice-Recognition-using-MFCC-algorithm-in-matlab:根据他

Source-codes-of-IF-ABC-algorithm.zip_matlab例程_matlab_

the-answer-of-Data-Structures-and-Algorithm-Analysis-in-C-in-chinese-master.zip

armijomatlab代码-Steepest-descent-algorithm-Matlab-:使用MATLAB进行最速下降算法（使用Ar

Face-Detection-And-Recognition-using-Alexnet-and-Viola-Jones-Algorithm-in-Matlab:使用CascadeObjectDetector进行Alexnet转移学习和面部检测

RGB-Image-encryption-based-on-chaotic-and-DNA-algorithm-matlab

最新资源