基于OM-LSA与MCRA的非稳态噪声环境语音增强方法

需积分: 9 86 浏览量更新于2024-07-20 收藏 387KB PDF 举报

本文主要探讨了在非平稳噪声环境中对语音进行增强的信号处理方法，特别关注的是基于SPEEX（Semi-Parametric Efficient Expert Encoder）的降噪技术。作者Israel Cohen和Baruch Berdugo来自Lamar信号处理有限公司，他们的研究发表在2001年的《信号处理》(Signal Processing)杂志第81卷，2403-2418页，该文章可以从Elsevier网站获取，链接为<www.elsevier.com/locate/sigpro>。论文的核心内容是提出了一种优化修改的对数谱幅度(OM-LSA)语音估计器，这是一种针对非平稳噪声环境设计的语音信号估计算法。OM-LSA通过最小化对数谱估计的均方误差，利用一个加权几何平均的方式，结合与语音存在不确定性相关的假设增益。这种方法考虑了语音活动的概率分布，以提高估计的准确性。此外，论文还介绍了最小均值控制递归平均(MCRA)噪声估计策略。MCRA通过过去频谱功率值的平均来估计噪声，这个过程中的平滑参数会根据子带中语音存在概率动态调整。这样做的目的是为了更好地适应不断变化的噪声环境，使得噪声估计更为精确。作者进一步引入了两种不同的语音存在概率函数。第一种用于估计语音活动，它基于信号与噪声比(SNR)的时间-频率分布，能够更准确地识别语音信号在不同时间、频率位置上的特性。第二种函数则用于控制噪声谱的自适应性，确保噪声估计的稳定性和鲁棒性。这篇论文提供了一种创新的信号处理框架，通过结合OM-LSA估计和MCRA噪声估计，以及智能地利用语音存在概率，为非平稳噪声环境下的语音增强任务提供了一种有效的解决方案。这对于实际应用，如语音通信、语音识别等领域，具有重要的理论和实践价值。

2406 I. Cohen, B. Berdugo / Signal Processing 81 (2001) 2403–2418

uncertainty. In Section 4, an expression for the a

priori speech absence probability is formulated,

based on the time–frequency distribution of the a

priori SNR. In Section 5, we present the MCRA

noise estimation approach and propose an appro-

priate speech presence probability function for

controlling the adaptation of the noise spectrum.

Finally, an objective and subjective evaluation of

the OM-LSA and MCRA estimators is performed

in Section 6.

2. Optimal gain modication

Let x(n) and d(n) denote speech and uncorre-

lated additive noise signals, respectively, where n

is a discrete-time index. The observed signal y(n),

given by y(n)=x(n)+d(n), is divided into overlap-

ping frames by the application of a window function

and analyzed using the short-time Fourier transform

(STFT). Specically,

Y (k; ‘)=

N −1



n=0

y(n + ‘M )h(n)e

−j(2)=N )nk

; (1)

where k is the frequency bin index, ‘ is the time

frame index, h is an analysis window of size N (e.g.,

Hanning window), and M is the framing step (num-

ber of samples separating two successive frames).

Let X (k; ‘) denote the STFT of the clean speech,

then its estimate is obtained by applying a specic

gain function to each spectral component of the

noisy speech signal:

X (k; ‘)=G(k; ‘)Y (k; ‘): (2)

Using the inverse STFT, with a synthesis window

h that is biorthogonal to the analysis window h [28],

the estimate for the clean speech signal is given by

ˆx(n)=



‘

N −1



k=0

X (k; ‘)

h(n

− ‘M )e

j(2)=N )k(n−‘M )

; (3)

where the inverse STFT is eciently implemented

using the weighted overlap-add method [5].

Among various existing speech enhancement

methods, which can be represented by dierent

spectral gain functions, we choose the LSA esti-

mator [8] due to its superiority in reducing musical

noise phenomena. The LSA estimator minimizes

{(log A(k; ‘) − log

A(k; ‘))

};

where A(k; ‘)=|X (k; ‘)| denotes the spectral

speech amplitude, and

A(k; ‘) its optimal estimate.

Assuming statistically independent spectral com-

ponents [8], the LSA estimator is dened by

A(k; ‘) = exp

{E[log A(k; ‘)|Y (k; ‘)]}: (4)

Given two hypotheses, H

(k; ‘) and H

(k; ‘),

which indicate, respectively, speech absence and

presence in the kth frequency bin of the ‘th frame,

we have

(k; ‘): Y (k; ‘)=D(k; ‘);

(5)

(k; ‘): Y (k; ‘)=X (k; ‘)+D(k; ‘);

where D(k; ‘) represents the STFT of the noise sig-

nal. We assume that the STFT coecients, for both

speech and noise, are complex Gaussian variables

[7]. Accordingly, the conditional PDFs of the ob-

served signal are given by

p(Y (k; ‘)

(k; ‘)) =

(k; ‘)

exp



−

Y (k; ‘)|

(k; ‘)



;

p(Y (k; ‘)

(k; ‘)) =

)($

(k; ‘)+$

(k; ‘))

× exp



−

Y (k; ‘)|

(k; ‘)+$

(k; ‘)



;

(6)

where $

(k; ‘)=E[|X (k; ‘)|

(k; ‘)] and $

(k; ‘)

= E[

|D(k; ‘)|

] denote, respectively, the variances

of speech and noise. Applying Bayes rule for the

conditional speech presence probability, one ob-

tains

P(H

(k; ‘)|Y (k; ‘)) =

#(k; ‘)

1+#(k; ‘)

, p(k; ‘); (7)

where #(k; ‘) is the generalized likelihood ratio de-

ned by

#(k; ‘)=

− q(k; ‘)

q(k; ‘)

p(Y (k; ‘)

(k; ‘))

p(Y (k; ‘)|H

(k; ‘))

(8)

and q(k; ‘)

, P(H

(k; ‘)) is the a priori probability

for speech absence. Substituting (6) and (8) into

剩余15页未读，继续阅读

跬步达千里

粉丝: 238
资源: 43

基于OM-LSA与MCRA的非稳态噪声环境语音增强方法

ALSA中集成SPEEX降噪算法方法

android 利用speex 音频降噪，回声消除demo

C# Speex 降噪

android speex 降噪

android speex 实现降噪

android 整合 speex 实现降噪

android 利用speex 音频降噪，回声消除

speex

speex_speex_speex.dll_

SPEEX.zip_speex_speex移植_speex移植到M0

最新资源