TPS方法规范化汉语与日语发音语音元音图像

69 浏览量更新于2024-08-27 收藏 1.65MB PDF 举报

本文主要探讨了在发音语音识别领域中，如何通过形态学归一化方法来处理和优化元音图像，以应对不同说话者间声带形状的个体差异问题。针对普通话和日语等语言，研究者提出了一种利用 Thin-Plate Spline (TPS) 方法进行声带形态归一化的策略。在语音识别过程中，声带和口腔的形态对语音特征有着显著影响，而不同说话者的这些结构可能存在显著的个体差异。这不仅增加了识别的复杂性，还可能导致识别精度下降。因此，通过形态学归一化，可以减少因说话者之间声带结构的差异而引起的噪声，有助于提高分析和建模的准确性。 TPS 是一种基于局部线性插值的非刚性形状匹配技术，它允许在保持局部平滑性和全局一致性的同时，灵活地变形一个形状以适应另一个形状。在这项研究中，研究人员将 TPS 应用于 Mandarin 和 Japanese 的元音图像上，目标是创建一个标准化的声带模型，以便在保持语音动态特性的前提下，更好地反映通用的发音规律。首先，他们收集了大量来自不同说话者的声带图像数据，然后使用 TPS 进行拟合和变形。这个过程涉及到计算两个形状之间的对应关系，以及在目标空间中找到最佳的变形参数，使得源形状能够接近目标形状，同时尽量保持原始形状的细节和动态变化。接着，通过对声带图像的 TPS 归一化，研究人员得到了一组标准化的声带模板，这可以作为后续语音识别系统的基础，比如深度神经网络 (DNN) 模型中的输入特征。标准化后的元音图像不仅可以提升语音识别系统的稳定性和泛化能力，还可以促进跨说话者间的语音模型共享，降低训练数据的需求。该研究的工作流程包括数据采集、预处理、TPS 归一化、以及验证归一化效果对语音识别性能的影响。通过实验结果，作者展示了这种形态学归一化方法的有效性，并讨论了其在实际应用中的潜在优势，为提高发音语音识别的准确性和可靠性提供了新的解决方案。此外，这种方法可能对其他依赖于声带或口腔图像的领域，如语音合成和声纹识别也有着积极的影响。

Morphological normalization of vowel images for articulatory speech

recognition

Jianguo Wei

a,b

, Jingshu Zhang

, Yan Ji

, Qiang Fang

, Wenhuan Lu

⇑

School of Computer Software, Tianjin University, 135 Yaguan Road, Jin Nan District, Tianjin 300350, China

Tianjin Key Laboratory of Cognitive Computing and Application, School of Computer Science and Technology, Tianjin University, 135 Yaguan Road, Jin Nan District, Tianjin

300350, China

Chinese Academy of Social Sciences, Beijing, China

article info

Article history:

Received 15 March 2016

Revised 30 June 2016

Accepted 12 October 2016

Available online 17 October 2016

Keywords:

Vocal tract normalization

Articulatory data

Acoustic data

Thin-Plate Spline

DNN

Articulatory recognition

abstract

Minimizing morphological variances of the vocal tract across speakers is a challenge for articulatory anal-

ysis and modeling. In order to reduce morphological differences in speech organs among speakers and

retain speakers’ speech dynamics, our study proposes a method of normalizing the vocal-tract shapes

of Mandarin and Japanese speakers by using a Thin-Plate Spline (TPS) method. We apply the properties

of TPS in a two-dimensional space in order to normalize vocal-tract shapes. Furthermore, we also use

DNN (Deep Neural Networks) based speech recognition for our evaluations. We obtained our template

for normalization by measuring three speakers’ palates and tongue shapes. Our results show a reduction

in variances among subjects. The similar vowel structure of pre/post-normalization data indicates that

our framework retains speaker speciﬁc characteristics. Our results for the articulatory recognition of iso-

lated phonemes show an improvement of 25%. Moreover, our phone error rate of continuous speech

reduced by 5.84%.

1. Introduction

In recent years, speech recognition technology has advanced

signiﬁcantly. Speaker adaptive and system robustness factors

remain vital to speech recognition systems. Interestingly, much

articulatory data used for speech research is also used for acoustic

data [1]. However, articulatory data are not widely applied. One

reason is that acquiring such data is difﬁcult. Another reason is that

variances in vocal tracts prove difﬁcult for usage in multi-subject

articulatory data research [2]. Hence, articulatory data are not as

popular as acoustic data in spite of its importance in the speech

research ﬁeld. In order to discover the kinematic properties that

characterize speaker differences, it is necessary to normalize

inter-subject articulatory data so that morphological variances

among different speakers are reduced.

As such, it is important to understand that there are differences

in vocal tracts among subjects, and that large nonlinear deforma-

tions can occur on vocal tracts. Therefore, it is difﬁcult to study

vocal tract shape by afﬁne transformation of simple rigid objects.

Up to now, researchers have proposed many normalization tech-

niques for articulatory space and acoustic space. For instance,

Bechman et al. [3] proposed straightening the walls of vocal tracts

in order to transform the coordinates of x-rays into micro beam

data. Hashi et al. [4] also proposed a method of normalizing vowel

postures for an X-ray micro beam database. The two methods both

straighten vocal tract walls in order to normalize vocal tract

length; however, this can cause the relative relationship between

the palate and tongue surface to change signiﬁcantly after transfor-

mation. Pitz et al., in a study concerning acoustic space, processed

the length of vocal tracts by using linear transformation in a fre-

quency domain [5]. Additionally, Saheer et al. normalized the

length of the vocal tract by using a linear transformation method

[6]. Among these studies, it is evident that they all attempt to nor-

malize vocal length tract length (in either articulatory or acoustic

space) without considering the articulatory features of vocal tract

shapes.

Because the vocal tract shape usually reﬂects local and nonlin-

ear deformations, it can be treated as a kind of non-rigid shape

deformation. Based on this idea, our study proposes a framework

of normalizing speakers’ EMA (Electromagnetic Midsagittal Articu-

lographic) data by using a TPS (Thin-Plate Spline warping) method

[7] (a non-linear transformation method applied in shape

http://dx.doi.org/10.1016/j.jvcir.2016.10.005

This paper has been recommended for acceptance by Zicheng Liu.

⇑

Corresponding author.

E-mail addresses: jianguo@tju.edu.cn (J. Wei), jingshu@tju.edu.cn (J. Zhang),

tjujiyan@tju.edu.cn (Y. Ji), fangqiang@cass.org.cn (Q. Fang), wenhuan@tju.edu.cn

(W. Lu).

J. Vis. Commun. Image R. 41 (2016) 352–360

Contents lists available at ScienceDirect

J. Vis. Commun. Image R.

journal homepage: www.elsevier.com/locate/jvci

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38660813

粉丝: 5
资源: 982

TPS方法规范化汉语与日语发音语音元音图像

500条WAV格式的中文语音数据集，可用于中文语音识别模型的测试集

树莓派语音识别科大讯飞语音识别离线包

使用薄板样条法进行声学发音语音识别的元音图像形态学规范化

发音信息对元音的归一化

语音识别.rar_GSM C51_matlab 语音识别_matlab语音识别_语音识别_语音识别matlab

通过发音语音识别研究发音期间发音者的贡献和补偿

各种语音识别算法的MATLAB代码.zip_HMM语音识别_alsou7y_matlab语音识别_语音识别 数据_语音识别算法

Java实现语音合成和语音识别.rar_Java实现语音合成和语音识别_ZVV_java 语音合成_语音合成_语音识别

HMM.rar_HMM_HMM语音识别_hmm 语音识别_语音 hmm 算法_语音识别 HMM

test.rar_数字 语音识别_数字语音识别_语音库_语音训练_语音识别库

最新资源

各种语音识别算法的MATLAB代码.zip_HMM语音识别_alsou7y_matlab语音识别_语音识别数据_语音识别算法

test.rar_数字语音识别_数字语音识别_语音库_语音训练_语音识别库