压缩感知技术在音频编码中的应用

需积分: 9 100 浏览量更新于2024-09-17 收藏 1.36MB PDF 举报

"音频编码利用压缩感知技术，通过在某种基域中信号稀疏的特性，以远低于奈奎斯特定理所需速率进行采样。在本文中，将压缩感知方法应用于正弦模型的音频信号，因为这种模型在频域中天然稀疏，即等同于少量正弦波的和。研究了压缩感知是否可用于低比特率音频编码，而不是像传统方法那样编码正弦参数（幅度、频率、相位），而是提议编码每个信号段中正弦分量时间域表示的随机选择样本。对单声道和多声道音频编码应用压缩感知的潜力进行了考察，并进行了听觉测试，结果令人鼓舞，表明所提出的方法有其潜力。" 详细说明：音频编码是一种将声音信号转换为数字形式的过程，以便存储、传输和处理。传统的音频编码方法通常涉及对音频信号进行采样、量化和编码，以达到数据压缩的目的。然而，这些方法往往需要按照奈奎斯特定理（Nyquist Theorem）的限制，即采样速率至少是原始信号最高频率的两倍，以避免失真。压缩感知（Compressed Sensing, CS）是一种新兴的数据采集和恢复理论，它打破了奈奎斯特定理的限制，允许在低于传统采样率的情况下采样信号，只要信号在某个基域中是稀疏的。稀疏意味着信号可以被表示为少数几个非零成分的组合。在音频编码中，正弦模型是一种常见的表示方式，因为它可以有效地模拟声音的基本组成，即一系列不同频率和振幅的正弦波。由于正弦模型在频域内天然稀疏，这使得压缩感知成为一种潜在的高效编码手段。与传统方法相比，CS音频编码不再直接编码正弦参数，而是选择性地编码时间域中的随机样本，这可以进一步减少所需的比特率。文章中提到，对单声道和多声道音频进行了实验，结果显示应用压缩感知的编码方法在降低比特率的同时，仍能保持可接受的音质。这表明，压缩感知技术在音频编码领域具有广阔的应用前景，尤其是在需要低带宽传输或存储效率高的场景中。听觉测试是评估新编码方法的关键步骤，因为它直接反映了人类感知的音质。测试结果的积极反馈证实了压缩感知在音频编码中的实用性，这可能开启音频编码的新篇章，为未来的音频压缩技术提供了一种更高效的选择。 "Audio coding using compressed sensing"探讨了如何利用压缩感知理论来改进音频编码，特别是针对正弦模型的音频信号，提出了新的编码策略，并通过实验验证了这种方法在降低比特率的同时，能够保持良好的音质，这为音频编码领域带来了创新的可能性。

1384 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 5, JULY 2011

model is usually called the sinusoids plus noise model, or deter-

ministic plus stochastic decomposition. In this model, the sinu-

soidal part corresponds to the “deterministic” part of the signal

due to the structured nature of this model. The remaining signal

is the sinusoidal noise component

, also referred to here as

residual or sinusoidal error signal, which is the “stochastic” part

of the audio signal, since it is very difﬁcult to accurately model,

but at the same time essential for high-quality audio synthesis.

Accurately modeling the stochastic component has been exam-

ined both for the single-channel case, e.g., [2], [20], [21] and

the multi-channel audio case [3]. Practically, after the sinusoidal

parameters are estimated, the noise component is computed by

subtracting the sinusoidal component from the original signal.

Note that in this paper we are only interested in encoding the

sinusoidal part.

A. Single-Channel Sinusoidal Selection

To perform single-channel sinusoidal analysis, we employed

state-of-the-art psychoacoustic analysis based on [22]. In the

iteration, the algorithm picks a perceptually optimal sinusoidal

component frequency, amplitude, and phase. This choice mini-

mizes the perceptual distortion measure

(2)

where

is the Fourier transform of the residual signal

(original frame minus the currently selected sinusoids) after the

th iteration, and is a frequency weighting function set

as the inverse of the current masking threshold energy.

One issue with CS encoding is that no further reﬁnement of

the sinusoid frequencies can be performed in the encoder, be-

cause frequencies which do not correspond to exact frequency

bins would result in loss of the sparsity in the frequency do-

main. This is an important problem, because it implies that we

must restrict the sinusoidal frequency estimation to the selection

of frequency bins (e.g., following a peak-picking procedure),

without the possibility of further reﬁnement of the estimated fre-

quencies in the encoder. This can be alleviated by zero-padding

the signal frame, in other words improving the frequency res-

olution during the parameter estimation by reducing the bin

spacing. We have found, though, that for CS-based encoding

this can be performed to a limited degree, as zero-padding will

increase the number of measurements that must be encoded as

explained in Section IV (and consequently the bitrate). Fortu-

nately, this problem can be partly addressed by employing the

“frequency mapping” procedure, described in Section IV. Fur-

thermore, since the sparsity restriction need not hold after the

signal is decoded, frequency re-estimation can be performed in

the decoder, such as interpolation among frames.

B. Multi-Channel Sinusoidal Selection

To perform multi-channel sinusoidal analysis, we have

extended the sinusoidal modeling method presented in

[23]—which employs a matching pursuit algorithm to de-

termine the model parameters of each frame—to include the

psychoacoustic analysis of [22]. For the multichannel case,

in each iteration, the algorithm picks a sinusoidal compo-

nent frequency that is optimal for all channels, as well as

channel-speciﬁc amplitudes and phases. This choice minimizes

the perceptual distortion measure

(3)

where

is the Fourier transform of the residual signal of

the

th channel after the th iteration, and is a frequency

weighting function set as the inverse of the current masking

threshold energy. The contributions of each channel are simply

summed to obtain the ﬁnal measure.

An important question is what masking model is suitable for

multi-channel audio where the different channels have different

binaural attributes in the reproduction. In transform coding, a

common problem is caused by binaural masking level differ-

ence (BMLD); sometimes quantization noise that is masked in

monaural reproduction is detectable because of binaural release,

and using separate masking analysis for different channels is not

suitable for loudspeaker rendering. However, this effect in para-

metric coding is not so well established.

We performed preliminary experiments using: 1) separate

masking analysis, i.e., individual

based on the masker

of channel

for each signal separately [see (3)]; 2) the masker

of the sum signal of all channel signals to obtain

for

all

; and 3) power summation of the other signals’ attenuated

maskers to the masker of channel

according to

(4)

In the above equation,

indicates the masker energy,

the estimated attenuation (panning) factor that was varied

heuristically, and

iterates through all channel signals ex-

cluding

. In this paper, we chose to use the ﬁrst method, i.e.,

separate masking analysis for channels

, for the

reason that we did not ﬁnd notable differencies in BMLD noise

unmasking, and that the sound quality seemed to be marginally

better with headphone reproduction. For loudspeaker reproduc-

tion, the second or third method may be more suitable.

The use of this psychoacoustic multi-channel sinusoidal

model resulted in sparser modeled signals, increasing the

effectiveness of our compressed sensing encoding.

III. C

OMPRESSED SENSING

Compressed sensing [15], [16]—also known as compressive

sensing or compressive sampling—is an emerging ﬁeld which

has grown up in response to the increasing amount of data that

needs to be sensed, processed and stored. A great majority of

this data is compressed as soon as it has been sensed at the

Nyquist rate. The idea behind compressed sensing is to go di-

rectly from the full-rate, analog signal to the compact represen-

tation by using measurements in the sparse basis. Thus, the CS

theory is based on the assumption that the signal of interest is

sparse in some basis as it can be accurately and efﬁciently repre-

sented in that basis. This is not possible unless the sparse basis is

known in advance, which is generally not the case. Thus com-

pressed sensing uses random measurements in a basis that is

剩余13页未读，继续阅读

boundles

粉丝: 0
资源: 3

压缩感知技术在音频编码中的应用

Introduction To Digital Audio Coding And Standards

Compressed Sensing Theory and Applications 2012

Block compressed sensing of natural images

Speech Coding Using Sub-Bands:Speech Coding Using Sub-Bands-matlab开发

ErrorResilientVideo Coding Using Unequally ProtectedKeyPictures

High-Fidelity Multichannel Audio Coding.pdf

ITCT.rar_coding USING MATLAB

13818-7 Advanced Audio Coding (AAC)

ERROR-CONTROL-CODING.rar_coding USING MATLAB_the code

SuB-band.rar_audio coding_sub

最新资源