基于感知驱动MUSIC与CCBC的语音识别鲁棒特征提取

180 浏览量更新于2024-08-27 收藏 418KB PDF 举报

"基于感知驱动MUSIC和CCBC的语音识别鲁棒特征提取" 在语音识别领域，特征提取是至关重要的一步，它直接影响到系统的识别性能。本文提出的是一种新的特征提取算法，旨在提升语音识别的鲁棒性。该算法的核心技术在于将感知信息融入到Multiple Signal Classification（MUSIC）谱中，这与传统的Mel频率倒谱系数（MFCC）方法相比，既提高了抗噪性能，又提升了计算效率。 MUSIC方法是一种用于信号源定位的谱估计技术，通常用于噪声环境中的信号分离。在语音识别中，MUSIC谱能够提供更丰富的频率域信息，尤其是对于噪声抑制和声源定位有显著优势。通过结合感知信息，算法可以更好地模拟人类听觉系统对不同频率成分的敏感度，从而在噪声环境下保持良好的特征表示。接着，算法提取出 cepstrum 系数作为特征参数。Cepstrum 是对频谱进行倒谱处理后得到的结果，它能够揭示语音信号的内在结构，特别是对于声学模型的构建非常有用。在讨论参数的有效性时，作者考虑了类可分性和说话人变异性这两个关键指标。类可分性是指特征是否能有效区分不同的语音类别，而说话人变异性则关注算法对不同说话人声音的适应能力。为了进一步增强鲁棒性，文章提出了使用Canonical Correlation based Compensation（CCBC）来应对训练集和测试集之间的不匹配问题。CCBC是一种利用Canonical Correlation Analysis（CCA）进行补偿的技术，它可以分析和校正两个数据集之间的关联性，确保在实际应用中，即便面对训练数据与测试数据的差异，也能保持较好的识别效果。实验评估显示，这种融合感知驱动MUSIC和CCBC的特征提取方法在多种噪声环境下都表现出优于传统MFCC的性能。这表明，该算法在实际的语音识别系统中具有广泛的应用潜力，特别是在噪声较大的环境中，如车载导航、智能家居等场景，能够显著提高系统的识别准确率和稳定性。这篇研究为语音识别领域的特征提取提供了新的思路，通过引入感知信息和优化的补偿策略，实现了更鲁棒的特征表示，对于推动语音识别技术的进步有着积极的贡献。

Chinese Journal of Electronics

Vol.20, No.1, Jan. 2011

Robust Feature Extraction for Speech

Recognition Based on Perceptually

Motivated MUSIC and CCBC

∗

HAN Zhiyan

, WANG Jian

, WANG Xu

and LUN Shuxian

(1.Colle g e of Information Science and Engineering, Bohai University, Jinzhou 121000, China)

(2.Colle ge of Information Science and Engineering, Northeastern University, Shenyang 110004, China)

Abstract — A novel feature extraction algorithm was

proposed to improve the robustness of speech recognition.

Core technology was incorporating perceptual information

into the Multiple signal classiﬁcation (MUSIC) spectrum,

it provided improved robustness and computational eﬃ-

ciency comparing with the Mel frequency cepstral coef-

ﬁcient (MFCC) technique, then the cepstrum coeﬃcients

were extracted as the feature parameter. The eﬀectiveness

of the parameter was discussed in view of the class sepa-

rability and speaker variability properties. To improve the

robustness, we considered incorporating Canonical corre-

lation based compensation (CCBC) to cope with the mis-

match between training and test set. We evaluated the

technique using improved Back-propagation neural net-

works (BPNN) in three diﬀerent tasks: in diﬀerent speak-

ers, diﬀerent recording channels and diﬀerent noisy envi-

ronments. The experimental results show that the novel

feature has well robustness and eﬀectiveness relative to

MFCC and the CCBC algorithm can make speech recog-

nition system robust in all three kinds of mismatch.

Key words — Speech recognition, Multiple signal clas-

siﬁcation (MUSIC), Canonical correlation based on com-

pensation (CCBC), Feature extraction

I. Introduction

The research on the robustness of speech recognition is

still a challenging task, especially in the development of core

speech processing algorithms. One example is almost all cur-

rent speech recognition systems use MFCC

[1]

as the acoustic

front-end. Many researchers would agree that it is a signiﬁ-

cant issue to formulate an eﬃcient acoustic front-end signal,

especially in noise while eliminating irrelevant information

[2]

Estimating the time-varying spectrum is a key ﬁrst step

in the acoustic front-end. The spectrum is often based on

perceptual considerations, such as Mel and Bark scales, and

incorporated into the acoustic front-end to improve accuracy,

MFCC is such a feature set.

MFCC is an eﬀective feature for ASR. It is computed

by applying a Mel-scaled ﬁlter bank either to the short-time

Fast Fourier transform (FFT) magnitude spectrum or to the

short-term LPC-based spectrum. However, both FFT and

LPC-based spectra are very sensitive to noise contamination.

Eigenvector-based methods such as MUSIC are popular in si-

nusoidal frequency estimation due to their high resolution and

less prior information. Moreover, this algorithm has well noise

restraining ability. So we adopted the MUSIC incorporating

perceptual information directly into the spectrum estimation

to improve cepstral representation in noise. Recognition tests

demonstrate the robustness of this method

[3,4]

It is a signiﬁcant issue to resolve the performance of ASR

system degrades severely in a serious mismatch between train-

ing and test conditions. The mismatch can be simply clus-

tered into three classes: diﬀerences of speakers, changes of

recording channel and eﬀects of noisy environment. In this

paper, we utilized CCBC to compensate three kinds of dis-

tortion sources, because the calculating procedure of CCBC is

speciﬁc and short and it reconstructs the correct correlation

between training vectors and test vectors

[5]

II. Algorithm Description

1. Description of perceptual warping

(1) Direct warping of the FFT spectrum

Using a non-linearly spaced ﬁlter bank to incorporate per-

ceptual traits into the acoustic front-end is a well-established

technique. The main aim of the ﬁlter bank is to average out

the harmonic information that exists in the FFT spectrum and

to track the spectral envelope. But, the ﬁlter bank produces

a gross spectrum that carries substantial pitch information

which is not desirable. It is shown that MUSIC is an appro-

priate spectral envelope modeling, and it is useful and safe to

remove ﬁlter bank structure and incorporate perceptual con-

sideration directly into the FFT spectrum.

One way of incorporating perceptual considerations is to

implement the perceptual scale through a ﬁrst order all-pass

∗

Manuscript Received Feb. 2009; Accepted Oct. 2010. This work is supported by the National Natural Science Foundaton of China

(No.60974071).

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38603259

粉丝: 5
资源: 922

基于感知驱动MUSIC与CCBC的语音识别鲁棒特征提取

"双目视觉下的SLAM三维场景建图及物体识别研究-重庆大学硕士学位论文

"基于CNN与SVM的人脸识别模型研究：特征融合与分类效果分析

"基于图像处理的车牌车型识别系统设计与实现：研究背景、算法研究与实施

Speech separation based on signal-noise-dependentdeep neural networks for robust speech recognition

A novel robust MFCC extraction method using sample-ISOMAP for speech recognition

A Simple and Robust Feature Point Matching Algorithm Based on Restricted Spatial Order Constraints for Aerial Image Registration

Body Surface Context: A New Robust Feature for Action Recognition From Depth Videos

Robust automatic speech recognition

Robust Sparse Coding for Face Recognition

A new framework for robust speech recognition in complex channel environments

最新资源