两层GMM结构的VTS特征补偿：鲁棒语音识别新方法

160 浏览量更新于2024-08-28 收藏 837KB PDF 举报

"基于两层GMM结构的VTS特征补偿技术在鲁棒语音识别中的应用" 在语音识别领域，尤其是在复杂环境噪声下，确保系统的鲁棒性是一项关键挑战。这篇研究论文提出了一种创新方法，即利用两层高斯混合模型（GMM）结构进行向量泰勒级数（VTS）特征补偿，以提高语音识别的稳健性。该方法旨在解决由于大量混合组件导致的VTS计算复杂性过高的问题。 VTS特征是一种高级的语音特征表示方法，它通过泰勒级数展开来近似声学模型，以捕捉语音信号的非线性变化。然而，这种方法的计算需求较高，特别是在处理含有噪声的语音信号时。为了应对这一挑战，论文提出了采用两层GMM结构来进行VTS特征的补偿。第一层GMM使用较少的混合成分来估计噪声的均值和方差。这一层的主要作用是通过对噪声特征的建模，有效地从原始的含噪语音信号中分离出噪声信息。通过准确估计噪声参数，可以更好地理解噪声环境并减小其对语音识别的影响。第二层GMM则采用了更多的混合成分，其目的是将第一层提取的噪声特征映射到干净的语音特征。这一层的GMM更专注于模拟噪声抑制后的语音特性，以实现从噪声中恢复出清晰的语音特征，从而提高识别性能。这种两层GMM结构的设计巧妙地平衡了计算效率与识别精度之间的关系。实验结果表明，所提出的两层GMM-VTS特征补偿算法显著降低了计算复杂度，同时保持了高识别率，尤其在噪声环境下表现突出。这为实际应用中的鲁棒语音识别提供了新的解决方案，例如在车载通信、智能家居、智能安防等领域，有助于提升系统在各种复杂环境下的语音交互体验。该研究论文出自东南大学信息科学与工程学院水下声学信号处理国家重点实验室以及河海大学计算机与信息工程学院的研究团队。作者包括林洲、李海静、陈颖、吴振阳和卢勇。他们的工作为鲁棒语音识别领域的理论研究和技术开发提供了重要的理论依据和技术支持。

VTS feature compensation based on two-layer GMM

structure for robust speech recognition

Lin Zhou, Haijing Li, Ying Chen, Zhenyang Wu

Key Laboratory of Underwater Acoustic Signal Processing

of Ministry of Education

School of Information Science and Engineering, SEU

Nanjing, China

Linzhou@seu.edu.cn, 1025784430@qq.com,

476141905@qq.com,zhenyang@seu.edu.cn

Yong Lu

College of Computer and Information Engineering

Hohai University

Nanjing, China

yonglu@hhu.edu.cn

Abstract—In this paper, a two-layer Gaussian Mixed Model

(GMM) structure for Vector Taylor Series (VTS) feature

compensation is proposed for robust speech recognition. Since

GMM with the numerous mixture components is used for VTS,

the computation complexity of VTS is extremely huge. To deal

with this issue, we propose two-layer GMM structure for VTS. In

detail, the GMM with fewer mixture components is utilized to

estimate the mean and variance of noise. With the estimated

noise parameters, the second GMM with more mixtures is

employed to map noisy features to clean features. The simulation

results show that the proposed algorithm significantly reduces

the computation complexity of VTS. Meanwhile, its performance

is well performed as that of the traditional system.

Keywords—GMM model; Vector Taylor Series; feature

compensation; speech recognition

I. INTRODUCTION

In real application, the performance of speech recognition

system degrades rapidly with the environmental noise and

speech variance. To address this problem, feature

compensation and model adaptation algorithms are the focus

research of robust speech recognition. For example, Stereo-

based Piecewise Linear Compensation for Environments

(SPLICE)

[1] is presented as a model-based feature

compensation, which estimates clean speech features from

noisy speech. Other methods, e.g., Maximum likelihood linear

regression (MLLR) [2, 3], maximum a posteriori (MAP)

[4]

and Maximum a posteriori linear regression (MAPLR)

[5]

are

utilized to deal with the degraded speech by the adaptation

model. Although the aforementioned methods have better

performance, it is proven in [6] that the parallel model

combination (PMC) [7] and vector Taylor series (VTS) [8, 9]

can outperform the existing methods. In VTS algorithm, noisy

speech features are represented through a first-order linear

approximation and thus clean speech features are estimated by

expectation-maximization (EM) approach.

However, the above feature compensation and model

adaptation algorithms [10] pay more attention to the

performance improvement, they seldom take computation

complexity into account, which limits the practical applications.

To deal with this problem, a two-layer GMM structure is

proposed to optimize the traditional VTS structure. Two

GMMs with different number of mixtures are built. One GMM

with fewer mixtures is firstly utilized to estimate the mean and

variance of noise based on Maximum Likelihood (ML)

criterion. Then the second GMM with more mixtures is

employed to estimate clean features from noisy speech. As a

result, the proposed algorithm significantly reduced the

computation of VTS. Meanwhile, the speech recognition

accuracy is still comparable with the traditional VTS algorithm.

The rest of this paper is organized as follows: Section II

analyzes the traditional VTS algorithm in detail. Section III

presents the two-layer GMM based VTS feature compensation.

Experimental results are given in Section IV and followed by

conclusions in Section V.

II. P

ERFORMANCE ANALYSIS OF THE TRADITIONAL VTS

In the cepstral domain, the relationship between the noisy

speech, clean speech and the additive noise can be expressed as:

log(1 exp( ( )))

−

=+ + −

xC C nx

(1)

where y, x and n denote the corresponding cepstral feature of

noisy speech, clean speech and noise, respectively; C and C

−1

denote the discrete cosine transform matrix and its inverse

transform matrix, respectively.

In VTS [11], through the first-order Taylor expansion at the

point (μ

, μ

), y can be expressed as:

()( )( )=− − + − +

yIUxμ Un μφ (2)

where I is the identity matrix; μ

is the mean of x and μ

is the

initial mean of n.

The U and φ are defined:

log[exp( ) exp( )]

exp( ( ))

1 exp( ( ))

diag

−−

−



−



+−



CCμ C μ

C μμ

UC C

C μμ

(3)

The mean μ

and covariance matrix ∑

of y is written by:

()

()()

=−+

Σ= − Σ − + Σ

ynn

yx n

μ U μμ φ

IU IU UU

(4)

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38713061

粉丝: 2

两层GMM结构的VTS特征补偿：鲁棒语音识别新方法

鲁棒语音识别：基于发音特征的声效处理算法

基于MFCC和GMM特征的语音识别系统开发

GMM技术实现高效语音性别识别

基于发音特征的声效相关鲁棒语音识别算法 (2015年)

【语音识别】基于MFCC和gmm特征实现语音识别含GUI.zip

【语音识别】基于MFCC和gmm特征实现语音识别含GUI.md

【语音识别】基于MFCC的GMM实现语音识别matlab源码.md

基于MFCC的GMM的语音识别.zip_epdbyvol_firmvnm_mfcc gmm_语音识别_语音识别 mfcc

GMM_Digital_Voice_Recognition：基于GMM与MFCC特征进行数字0-9的语音识别，GMM，MFCC，语音识别，中文数据，sklearn，数字语音识别

基于GMM结合HM实现的语音识别附matlab代码

最新资源