线性预测方法（LPC, LPCC, PLP）在孤立词语音识别中的对比与模型阶数影响研究

需积分: 46 44 浏览量更新于2023-05-02 收藏 245KB PDF 举报

本研究论文深入探讨了在孤立单词的语音识别中，线性预测编码（LPC）、线性预测倒谱系数（LPCC）和感知线性预测（PLP）三种参数化方法的有效性，以及模型阶数变化对其性能的影响。自动语音识别（ASR）尽管已有六十年的研究历史，但在说话者独立识别任务中，机器的准确性和处理速度仍面临挑战。因此，研究者选择这些常用的技术来提升语音处理的效率和准确性。 LPC是一种基于统计模型的语音分析方法，通过估计声音信号的线性预测误差来捕捉语音信号的频域特性。然而，LPCC进一步改进了这一方法，它结合了线性预测和倒谱分析，能够更好地保留信号的非线性信息，尤其在处理噪声和频率变化时，显示出更好的性能。PLP则引入了感知过程，通过对信号进行预加重和滤波，减小了人耳感知到的语音特征差异，使得模型在噪声环境中的鲁棒性更强。该研究采用了矢量量化（VQ）技术将语音转化为模板，利用欧几里得距离作为分类器，对TI-46文字数据库中的干净语音和受噪声影响的语音进行了对比实验。结果表明，随着噪声水平从40dB逐渐下降至-5dB，LPCC在噪音环境下表现优于LPC，而PLP（带DELTA和DELTA-DELTA）在所有条件下，包括干净和嘈杂语音，都展现出更优的识别率。模型阶数的变化对识别率有显著影响，过低的阶数可能导致信息丢失，过高则可能导致过拟合。因此，找到合适的模型阶数对于优化识别性能至关重要。通过本研究，作者不仅比较了不同技术在不同环境下的表现，还讨论了它们在实际应用中的优势和局限性，为后续的说话者独立语音识别技术优化提供了有价值的参考依据。这项工作不仅深化了我们对语音特征提取和降噪方法的理解，也为提高孤立单词在复杂环境下的识别准确率提供了实用策略。未来的研究可以考虑结合深度学习或其他先进的信号处理技术，以进一步提升ASR系统的性能。

International Journal of Electronics and Communication Engineering Research and Development (IJECERD),

ISSN 2228 – 9282 (Print), ISSN 2248 –9290 (Online), Volume 1, Number 1, January - April (2011)

Figure 1 Main steps in Isolated Words Speech Recognition (a) Training Phase (b)

Testing Phase

2. FEATURE EXTRACTION TECHNIQUES

Feature extraction aims at giving a useful representation of the speech signal by capturing

the important information from it. A common division of the feature extraction

approaches is into production-based and perception-based methods. Linear predictive

coding (LPC), Linear Predictive Cepstral Coefficient (LPCC) is an example from the first

group while Mel-frequency cepstral coefficients (MFCC) and Perceptual Linear

Prediction (PLP) belong to the perception-based approaches family. In speech

recognition, a premium is placed on extracting features that are somewhat invariant to

changes in the speaker [1, 2]. So feature extraction involves analysis of speech signal.

Broadly the feature extraction techniques are classified as temporal analysis and spectral

analysis technique. In temporal analysis the speech waveform itself is used for analysis.

In spectral analysis spectral representation of speech signal is used for analysis [3]. In

theory, it should be possible to recognize speech directly from the digitized waveform; it

should be represented directly in terms of the signal’s Fourier coefficients or as the set

of power values at the outputs from a bank of filters [4]. The envelope of the spectrum

can be represented indirectly in terms of the parameters of an all-pole model, using linear

predictive coding (LPC), or in terms of the first dozen or so coefficients of the

cepstrum—the inverse Fourier transform of the logarithm of the spectrum[5].

One reason for computing the short-term spectrum is that the cochlea of the human ear

performs a quasi-frequency analysis. The analysis in the cochlea takes place on a

nonlinear frequency scale (known as the Bark-scale or the mel-scale). This scale is

approximately linear up to about 1000 Hz and is approximately logarithmic thereafter.

Electronic copy available at: https://ssrn.com/abstract=3527930

剩余12页未读，继续阅读

weixin_38665944

粉丝: 6
资源: 914

线性预测方法（LPC, LPCC, PLP）在孤立词语音识别中的对比与模型阶数影响研究

最新资源