基于感知驱动的贝叶斯语音增强技术

需积分: 5 180 浏览量更新于2024-09-09 收藏 545KB PDF 举报

"这篇论文‘基于感知驱动的贝叶斯估计器的语音增强’是美国德克萨斯大学达拉斯分校的Philipos C. Loizou教授在2005年发表于IEEE Transactions on Speech and Audio Processing期刊上的研究，主要探讨了如何通过改进传统最小均方误差（MMSE）估计算法来提升语音增强的效果，特别是针对语音降噪的应用。" 在语音处理领域，传统的最小均方误差（MMSE）估计方法通常用于短期谱幅度的估算。这种估计方法基于平方误差损失函数的最小化，然而，平方误差损失函数在主观上并不具有意义，因为它不一定能突出频谱峰值（元音形成器）信息，或者考虑听觉掩蔽效应。Loizou教授的论文正是针对这一问题提出了新的解决方案。论文中，作者提出了基于感知驱动的贝叶斯估计器来估计语音的短期谱幅度。这些估计器的构建基于与语音失真度量相关的成本函数，例如Itakura-Saito失真和加权似然比失真度量。这两个度量在语音识别领域已有成功的应用。论文中详细阐述了三类不同的贝叶斯估计器： 1. 第一类估计器：可能采用了Itakura-Saito失真的变体，这种失真度量考虑了人耳对不同频率成分敏感性的差异，能更好地模拟人类听觉系统的感知特性。 2. 第二类估计器：可能利用了加权似然比失真，这种方法可以适应不同的噪声环境，根据噪声类型和强度对语音进行有针对性的增强。 3. 第三类估计器：可能结合了以上两种或更多种失真度量，以达到更全面的语音质量提升。通过这些感知驱动的估计器，论文旨在设计出能更好地保留语音关键特征，同时有效抑制背景噪声的算法。这样的技术对于提高语音通信、语音识别和听力辅助设备的性能至关重要。论文的贡献在于将听觉感知理论融入到信号处理算法中，提高了语音处理的实用性和用户体验。

LOIZOU: SPEECH ENHANCEMENT BASED ON PERCEPTUALLY MOTIVATED BAYESIAN ESTIMATORS 859

Fig. 1. Plot of the magnitude spectrum,

, of a 30-ms segment of the vowel /iy/ taken from the word “heed” (

F1 = 344 Hz

F2 = 2450 Hz

). Plots of the

spectra

and

are superimposed for comparison. The latter spectra are shifted relative to

for better visual clarity.

where , ,

, and .

Using (10), we can also express (9) as

(11)

where

. The above conﬂuent hypergeometric

function can also be written in terms of a Bessel function [17,

eq. A1.31b], thereby simplifying the above estimator to

(12)

where

denotes the modiﬁed Bessel function of order zero.

It is worthwhile noting that the above estimator becomes the

Wiener estimator when

. To prove that, after substituting

in (12) the approximation of the Bessel function,

(for ), we get

(13)

which is the Wiener estimator.

Next, we considered generalizing the cost function given in

(5) to weigh the estimation error by , i.e.,

(14)

Note that the above distortion measure emphasizes spectral

peaks when

, but emphasizes spectral valleys when

. This is illustrated in Fig. 1. For , the above

distortion measure is similar to the model distortion measure

proposed by Itakura [11] for comparing two autoregressive

speech models. The cost function used in (5) is obtained by

setting

. We refer to the above distortion measure as

the weighted Euclidean distortion measure, since it can be

written as

, where

is a diagonal matrix, having as the th diagonal element,

. Using (14), the following risk is then minimized:

(15)

Taking the derivative of

with respect to and setting it equal

to zero, we get

(16)

Solving for

we get

(17)

Note that the above Bayesian estimator is the ratio of the

( ) moment of the posterior pdf and

the

th moment of , i.e., it can be written as:

. In our case, is not

restricted to be an integer, however. Note also that when

we get the traditional MMSE estimator derived in [1].

剩余12页未读，继续阅读

Sunrise_sxit

粉丝: 0
资源: 4

基于感知驱动的贝叶斯语音增强技术

基于LMS 算法的多麦克风降噪

speech Enhancement Based on Deep Denoising Autoencode

基于多谱自适应小波去噪的语音增强 Speech enhancement based on adaptive wavelet

语音降噪联合语音识别训练的相关主流文献和开源代码有哪些

a robust gsc beamforming method for speech enhancement using linear micropho

生成基于模糊集理论的图像增强算法的MATLAB代码

J. K. Kim, S. H. Park, "Adaptive noise cancellation using a Kalman filter for speech enhancement," IEEE Transactions on Consumer Electronics, vol. 47, no. 3, pp. 564-570, 2001.概括文献内容

将这个代码做成gui界面，分别有打开语音，语音增强以按钮

列举几个常用的开源深度学习波束成形方法并附上网址

abap enhancement-section 修改

最新资源