没有合适的资源?快使用搜索试试~ 我知道了~
首页自动提取音素的随机模型驱动的语音识别
自动提取音素的随机模型驱动的语音识别
需积分: 9 0 下载量 71 浏览量
更新于2024-08-07
收藏 222KB PDF 举报
本文档探讨了一种基于音素分割的随机音素段模型在语音识别中的应用,由Chieko Furuichi、Katsura Aizawa和Kazuhiko Inoue三位学者合作完成,他们在Toin University of Yokohama的工程学院发表。研究的核心内容是提出了一种新颖的统计音素分割模型,该模型通过自动提取的音素片段参数进行训练。 在传统的语音识别过程中,预处理阶段首先通过精确的音素分割确定语音中的边界。作者所提出的系统利用一种概率性的音素段模型,对每个音素进行有效区分。这个模型构建了一个包含评分的音素段格网,使得识别过程更加精确。通过这种方式,系统能够有效地减少不必要的参数,专注于那些在分离不同音素时起关键作用的特征参数。 这种转变将连续语音中的音素识别问题转化为一个分类问题,即通过比较输入的符号序列与词典项,实现对特定音素的识别。由于所使用的音素边界检测系统具有高精度,这显著提高了识别的效率和准确性,减少了识别过程中的噪声干扰,从而优化了整体的语音识别性能。 这项研究对于提高基于统计方法的语音识别技术具有重要意义,特别是在处理连续语音和减少复杂度方面,为后续的语音处理应用提供了新的思路和技术支撑。它不仅推动了学术界对音素分割在语音识别中作用的理解,也为实际的语音识别系统设计提供了实用的工具和理论依据。
资源详情
资源推荐
Speech Recognition Using Stochastic Phonemic Segment Model
Based on Phoneme Segmentation
Chieko Furuichi, Katsura Aizawa, and Kazuhiko Inoue
Faculty of Engineering, Toin University of Yokohama, 1614 Kurogane, Midori, Yokohama, Ja pan 225-8502
SUMMARY
This paper discusses speech recognition based on a
new statistical phoneme segment model which is trained by
phoneme parameters derived from automatically extracted
phoneme segments. The proposed system operates as fol -
lows. In preprocessing before recognition, the phoneme
boundaries are detected by segmentation. The phonemes
are discriminated using a stochastic phoneme segment
model, and a phoneme segment lattice with scores is con-
structed. Next the speech recognition is performed by
matching of symbol sequences to dictionary items. The
segmentation system that is employed can infer phoneme
boundaries with high accuracy. This helps to eliminate
unnecessary parameters, leaving the feature parameters
which are effective in separating phonemes. In other words,
the phoneme recognition problem in continuous speech can
be reduced to a discrimination problem and thus a speaker-
independent model can be constructed from a relatively
small number of training data. The stochastic phoneme
segment model is trained with training samples extracted
from a phoneme-balanced word set of 4920 words uttered
by 10 speakers. In a recognition experiment with 6709
words uttered by 63 nontraining speakers, a recognition rate
of 92.6% was obtained as the average for all speakers, using
a word dictionary of 212 words. © 2000 Scripta Technica,
Syst Comp Jpn, 31(10): 8998, 2000
Key words: Segment model; mixed distribution;
phoneme segmentation; speech recognition.
1. Introduction
In continuous speech recognition systems, it is desir-
able to improve the accuracy of the acoustic model in order
to improve the recognition rate for speech units such as
phonemes and syllables. In recent years, many studies of
segment models have attempted to include the temporal
changes of the speech feature parameters in order to im-
prove the accuracy of the acoustic model [14]. When a
segment model is applied to recognition, the dimension of
the parameters is usually increased. If the amount of train-
ing data is insufficient, the estimation accuracy of the model
may be degraded, or a large amount of computation may be
needed for recognition. Approaches to dealing with this
problem have included compression of the parameter di-
mension by K-L expansion [5], and use of the output from
a neural network into which several consecutive frames are
simultaneously input [6].
In the recognition of continuous speech by the seg-
ment model, there can be two approaches. One is to perform
recognition without applying preliminary segmentation.
The other is to detect the boundaries between phonemes or
syllables by segmentation, and then to perform recognition
using the segment model. The former method has been used
more often, since segmentation is very difficult and a sys-
tem accurate enough to be used for preprocessing before
recognition is difficult to create.
If the boundaries between phonemes or syllables can
be estimated with high accuracy by the latter method,
however, the problem of recognizing phonemes or syllables
in continuous speech can be reduced to a discrimination
problem, unnecessary searching can be minimized, and the
© 2000 Scripta Technica
Systems and Computers in Japan, Vol. 31, No. 10, 2000
Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J82-D-II, No. 7, July 1999, pp. 11111119
89
下载后可阅读完整内容,剩余9页未读,立即下载
weixin_38656337
- 粉丝: 4
- 资源: 921
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 十种常见电感线圈电感量计算公式详解
- 军用车辆:CAN总线的集成与优势
- CAN总线在汽车智能换档系统中的作用与实现
- CAN总线数据超载问题及解决策略
- 汽车车身系统CAN总线设计与应用
- SAP企业需求深度剖析:财务会计与供应链的关键流程与改进策略
- CAN总线在发动机电控系统中的通信设计实践
- Spring与iBATIS整合:快速开发与比较分析
- CAN总线驱动的整车管理系统硬件设计详解
- CAN总线通讯智能节点设计与实现
- DSP实现电动汽车CAN总线通讯技术
- CAN协议网关设计:自动位速率检测与互连
- Xcode免证书调试iPad程序开发指南
- 分布式数据库查询优化算法探讨
- Win7安装VC++6.0完全指南:解决兼容性与Office冲突
- MFC实现学生信息管理系统:登录与数据库操作
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功