没有合适的资源?快使用搜索试试~ 我知道了~
首页使用线性预测编码(Lpc),线性预测倒谱系数(Lpcc)和感知线性预测(Plp)对孤立单词的语音识别进行比较,以及模型阶数变化对语音识别率的影响-研究论文
机器的自动语音识别(ASR)是六十多年来的研究目标。 尽管取得了所有进步,但在准确性和速度方面,机器仍无法匹敌人类对手的性能,特别是在说话者独立的语音识别的情况下。 因此,当今语音识别研究的重要部分集中在说话者独立语音识别问题上。 在识别之前,必须执行语音处理以获得信号的特征向量。 因此,前端分析起着重要的作用。 原因是其广泛的应用范围以及现有语音识别技术的局限性。 本文的目的是研究,实施和比较那些广泛用于语音识别中的参数化方法,线性预测编码(LPC)技术,线性预测倒谱系数(LPCC)和感知线性预测(PLP)。我们还观察到模型参数变化对识别率的影响。 矢量量化(VQ)用于为每个话语准备单词模型作为模板。 此外,欧几里得距离用作分类器。 使用TI-46文字数据库,比较了干净语音中的前端以及因噪声和频谱可变性而降低的语音。 我们研究了各种SNR级别40dB,35dB,30dB,20dB,15dB,10dB,5dB,0dB和-5dB时语音识别和效果噪声的某些方面。 实验是用高斯白噪声进行的。 据观察,在干净和嘈杂的环境中,LPCC比LPC效果更好。带有DELTA和DELTA-DELTA的PLP在干净和嘈杂的语音中比LPC和LPCC更好。 本文对以上所有语音识别技术进行了比较。 还讨论了每种技术在不同环境下的适用性。
资源详情
资源评论
资源推荐

International Journal of Electronics and Communication Engineering Research and Development (IJECERD),
ISSN 2228 – 9282 (Print), ISSN 2248 –9290 (Online), Volume 1, Number 1, January - April (2011)
7
COMPARISON OF SPEECH RECOGNITION OF ISOLATED
WORDS USING LINEAR PREDICTIVE CODING (LPC), LINEAR
PREDICTIVE CEPSTRAL COEFFICIENT (LPCC) & PERCEPTUAL
LINEAR PREDICTION (PLP) AND THE EFFECT OF VARIATION
OF MODEL ORDER ON SPEECH RECOGNITION RATE
Yogesh S Angal
1*
, R.H.Chile
2
, R.S.Holambe
3
1*
Department of Instrumentation Engineering,
Padmshree Dr.D.Y.Patil Institute of Engineering and Technology, Pimpri, Pune, M.S, India.
2, 3
Department of Instrumentation Engineering,
S.G.G.S.Institution of Engineering and Technology, Vishnupuri, Nanded, M.S.,India.
ABSTRACT
Automatic Speech Recognition (ASR) by machine has been a goal of research from more
than six decades. In spite of all advances, machines cannot match the performance of
their human counterparts in terms of accuracy and speed, especially in case of speaker
independent speech recognition. So, today significant portion of speech recognition
research is focused on speaker independent speech recognition problem. Before
recognition, speech processing has to be carried out to get feature vectors of the signal.
So, front end analysis plays an important role. The reasons are its wide range of
applications and limitations of available techniques of speech recognition. The aim of
this paper is to study, implement and compare those widely used parameterization
method Linear Predictive Coding (LPC) techniques and Linear Predictive Cepstral
Coefficient (LPCC) and Perceptual Linear Prediction (PLP) in speech technology for
speech recognition .We have also observed the effect of variation of model parameter
on recognition rate. Vector Quantization (VQ) is used to prepare word model as a
template for each utterance. Moreover Euclidean distance is used as a classifier.
Front ends were compared in clean speech and with speech degraded by noise and
spectral variability, using the TI-46 word database. We have studied some aspects of
speech recognition and effect noise at various SNR levels 40dB, 35dB, 30dB, 20dB,
15dB, 10dB, 5dB, 0dB and-5dB. Experimentation is carried out with white Gaussian
noise.
It has been observed that LPCC gives better results as compared LPC in clean as well as
in noisy environment.PLP with DELTA and DELTA-DELTA works better than LPC and
LPCC in clean and noisy speech. Comparison of all above techniques for speech
IJECERD
© PRJ PUBLICATION
International Journal of Electronics and Communication
Engineering Research and Development (IJECERD), ISSN
2228 – 9282 (Print), ISSN 2248 –9290 (Online), Volume 1,
Number 1, January - September (2011), pp. 07-19
© PRJ Publication
http://www.prjpublication.com/IJECERD.asp
Electronic copy available at: https://ssrn.com/abstract=3527930

International Journal of Electronics and Communication Engineering Research and Development (IJECERD),
ISSN 2228 – 9282 (Print), ISSN 2248 –9290 (Online), Volume 1, Number 1, January - April (2011)
8
recognition is carried out in this paper. Suitability of each technique for different
environment is also discussed.
Keywords: ASR, LPC, LPCC, PLP, VQ.
1. INTRODUCTION
Speech recognition systems perform two fundamental operations: Signal modeling and
pattern matching. Signal modeling represents process of converting speech signal into a
set of parameters. Pattern matching is the task of finding parameter sets from memory
which closely matches the parameter set obtained from the input speech signal.
An Isolated Word Recognition (IWR) system consists of four main steps: preprocessing,
framing and windowing, feature extraction and pattern classification as shown in Figure
1. The function of preprocessing step is to make the spectrum flat (especially at high
frequency) and possibly removal of noise. The framing and windowing operation divides
the speech signal into overlapping frames and multiplies each frame by window. The
feature extractor does the job of mapping each frame of the speech signal into the set of
features called the feature vector which best approximates the signals property so
that efficient computations and compact representation of speech signal is possible. In the
pattern classifier module, word models are built for each word (available in the
vocabulary) from the feature vectors during the training phase. In order to recognize a
word, testing speech is passed through the same feature extractor and the test features are
compared with each of the stored models of utterances. The test sample of speech is
assigned to the word whose model gives the minimum distance or maximum probability
(based on the classifier used) with the test sample. Hence for an IWR system to work,
following major tasks are to be accomplished.
Training Phase
-Preprocessing of training speech samples.
-Framing and Windowing.
-Extracting appropriate speech features (feature vectors) from each windowed frame.
-Building a model for each word from feature vectors.
Testing Phase
-Preprocessing of test speech samples.
-Framing and Windowing.
-Extracting appropriate speech features from the given speech data.
-Obtain Euclidean distances (or Probabilities) of feature vectors with each word
model.
-Decision based on minimum distance (or maximum probability).
The main steps of extracting the important information from the speech signal
(During both training and testing phase) are Preprocessing, Framing and Windowing
and Feature extraction
Electronic copy available at: https://ssrn.com/abstract=3527930

International Journal of Electronics and Communication Engineering Research and Development (IJECERD),
ISSN 2228 – 9282 (Print), ISSN 2248 –9290 (Online), Volume 1, Number 1, January - April (2011)
9
Figure 1 Main steps in Isolated Words Speech Recognition (a) Training Phase (b)
Testing Phase
2. FEATURE EXTRACTION TECHNIQUES
Feature extraction aims at giving a useful representation of the speech signal by capturing
the important information from it. A common division of the feature extraction
approaches is into production-based and perception-based methods. Linear predictive
coding (LPC), Linear Predictive Cepstral Coefficient (LPCC) is an example from the first
group while Mel-frequency cepstral coefficients (MFCC) and Perceptual Linear
Prediction (PLP) belong to the perception-based approaches family. In speech
recognition, a premium is placed on extracting features that are somewhat invariant to
changes in the speaker [1, 2]. So feature extraction involves analysis of speech signal.
Broadly the feature extraction techniques are classified as temporal analysis and spectral
analysis technique. In temporal analysis the speech waveform itself is used for analysis.
In spectral analysis spectral representation of speech signal is used for analysis [3]. In
theory, it should be possible to recognize speech directly from the digitized waveform; it
should be represented directly in terms of the signal’s Fourier coefficients or as the set
of power values at the outputs from a bank of filters [4]. The envelope of the spectrum
can be represented indirectly in terms of the parameters of an all-pole model, using linear
predictive coding (LPC), or in terms of the first dozen or so coefficients of the
cepstrum—the inverse Fourier transform of the logarithm of the spectrum[5].
One reason for computing the short-term spectrum is that the cochlea of the human ear
performs a quasi-frequency analysis. The analysis in the cochlea takes place on a
nonlinear frequency scale (known as the Bark-scale or the mel-scale). This scale is
approximately linear up to about 1000 Hz and is approximately logarithmic thereafter.
Electronic copy available at: https://ssrn.com/abstract=3527930
剩余12页未读,继续阅读


















weixin_38665944
- 粉丝: 6
- 资源: 915
上传资源 快速赚钱
我的内容管理 收起
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助

会员权益专享
最新资源
- ARM Cortex-A(armV7)编程手册V4.0.pdf
- ABB机器人保养总结解析.ppt
- 【超详细图解】菜鸡如何理解双向链表的python代码实现
- 常用网络命令的使用 ipconfig ping ARP FTP Netstat Route Tftp Tracert Telnet nslookup
- 基于单片机控制的DC-DC变换电路
- RS-232接口电路的ESD保护.pdf
- linux下用time(NULL)函数和localtime()获取当前时间的方法
- Openstack用户使用手册.docx
- KUKA KR 30 hA,KR 60 hA机器人产品手册.pdf
- Java programming with JNI
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制

评论0