三维生理发音模型：可视化普通话发音训练辅助

需积分: 12 14 浏览量更新于2024-08-12 收藏 923KB PDF 举报

本文是一篇研究论文，标题为"使用生理发音模型可视化普通话发音"，主要探讨了如何通过视觉反馈提高语言学习者准确发音的能力，尤其是在汉语普通话教学与康复领域。作者们认识到，仅凭听觉反馈可能不足以精确地指导学习者控制发音器官的运动，特别是在面对不熟悉的语音时。因此，他们提出了一种创新的方法，即利用三维（3D）生理发音模型，该模型是基于中国电磁语音记录（Chinese Electromagnetic Articulography, CEMA）数据驱动的。 CEMA是一种先进的语音分析技术，它通过测量口腔内的肌肉活动来捕捉语音产生的物理过程。通过将CEMA数据映射到生理模型上，研究人员能够实时展示说话者的发音动作，包括舌头、嘴唇、颚部等部位的精确运动轨迹。这种可视化方法提供了直观的反馈，帮助学习者识别发音错误，并指导他们调整和改进发音技巧，从而促进学习进度。在实际应用中，该模型可能包括交互式界面，学习者可以通过观察自己的发音在模型中的模拟效果，即时调整口型、力度和速度，直至达到理想的发音标准。这种方法对于非母语者学习普通话，或者有发音障碍的人士进行康复训练具有显著的价值。此外，由于视觉反馈的直观性和即时性，它可能也适用于语音技术的教学辅助工具，如在线语言学习平台或语音矫正软件。这篇论文的核心贡献在于开发并验证了一个有效的工具，通过结合生理发音模型和CEMA数据，实现了普通话发音的可视化教学与评估，为提升语言学习效率和发音准确度提供了新的可能性。这不仅推动了语音学研究的前沿，也为教育科技领域带来了实用的创新应用。

Visualization of Mandarin Articulation by using a

Physiological Articulatory Model

Dian Huang

, Xiyu Wu

†

, Jianguo Wei

, Hongcui Wang

, Chan Song

, Qingzhi Hou

, and Jianwu Dang

*†

Tianjin key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, China

E-mail: {huang_dian@163.com, jianguo.fr@gmail.com, laurelwind@gmail.com,

songchan_8855@126.com, darcy.hou@gmail.com} Tel: +86-15822883934

†

Japan Advanced Institute of Science and Technology, Ishikawa, Japan

E-mail: {xiyuwu@jaist.ac.jp, jdang@jaist.ac.jp}

Abstract—It is difficult for language learners to produce

unfamiliar speech sounds accurately because they may not

manipulate articulatory movements precisely by auditory

feedback alone. Visual feedback can help identify the errors and

promote the learning progress, especially in language learning

and speech rehabilitation fields. In this paper, we propose a

visualization method for Mandarin phoneme pronunciation

using a three-dimensional (3D) articulatory physiological model

driven by Chinese Electromagnetic Articulographic (EMA) data.

A mapping from EMA data to physiological articulatory model

was constructed using three points on the mid-sagittal plane of

the tongue. To do so, we analyzed configurations of 30 Chinese

phonemes based on an EMA database. At the same time, we

designed nearly 150,000 muscle activation patterns and applied

them to the physiological model to generate model-based

articulatory movements. As the result, we developed a visualized

articulation system with 2.5 dimensional and 3D views

respectively. The mapping was evaluated using MRI data. It is

found that the mean deviation was about 0.21cm for seven

vowels.

I. INTRODUCTION

The studies of pronunciation learning have shown that

detailed and accurate error feedback is effective in correcting

the errors addressed in the learning process [1] and visualized

feedback is playing an important rule. Learners will evaluate

their learning through auditory feedback if there is no other

feedback available. However, even if the learner can

recognize the discrepancy between their utterance and the

target speech sounds, it is difficult for them to adjust their

articulations. In language learning process, explicit guidance

is more effective than implicit introduction [2]. A Computer

Assisted Language Learning (CALL) system which contains

visual feedback makes the learners easier to correct their

articulations by providing a visualized articulatory target. In

this study, we put forward a Mandarin phoneme

pronunciation visualization method by using Chinese EMA

data to drive the 3D articulatory physiological model.

With the development of speech analysis and observation

technology, the observation and presentation of pronunciation

visualization is becoming much easier. On the one hand, some

researchers used two-dimensional (2D) model to visualize

articulation. For example, Kaburagi and Honda proposed a 2D

model to predict articulator movements for continuous speech

based on EMA data [3]. A 2D visual-speech synthesizer was

presented to animate the human articulators by Wong et al [4].

LaRocca, et al. presented a system which used articulatory

information in the form of a side-view of a transparent head to

detect spoken segmental errors and provide corrected

feedback so that the learner could see articulator placement

[5]. As proposed by Eskenazi et al [6], a mid-sagittal 2D

model was employed to present immediate corrected

articulatory help for each type of possible phonetic or

prosodic error made by the students. The accuracy of these

methods may be guaranteed but they are not easy to

understand. On the other hand, 3D model is used to present

articulators movements. For example, Computer Graph (CG)

technology is used to construct a 3D model for online Chinese

learning [7]. If learners' pronunciation is incorrect, the system

will demonstrate the correct articulator organs‟ movement, as

well as wrong pronunciations made by learners themselves [8].

The 3D method above is easy to observe and understand but

how to guarantee its accuracy is the biggest challenge. In the

previous studies, one can see that in the field of articulation

visualization, especially in Mandarin, higher accuracy and

intuitive representation cannot be combined well in a

visualization method.

In this paper, we constructed a 3D visualization system for

Chinese phoneme via jointing the advantage of EMA data‟s

high temporal resolution and 3D physiological model‟s high

space resolution, which was expected to provide intuitive and

accurate visualization of articulator movements flexibly. Our

visualization system consists of two modules. One is the 3D

articulatory model which is extracted from a physiological

model, and the other is 2D Chinese EMA data. We applied

three points on the tongue in EMA data to select the best-

matched mid-sagittal shape of 3D model so that we can get

the 3D visualized articulatory organ‟s movements for each

Chinese phoneme. To evaluate the accuracy of our method,

we compared the seven vowels‟ best-matched model data

with Chinese MRI data. The result showed that the accuracy

of our model is acceptable.

The following paper is organized as below: PartⅡand Ⅲ

describe the details of the construction of Chinese EMA

database and 3D model based database. Part Ⅳ introduces the

method for building up the mapping, and partⅤshows the

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38705874

粉丝: 6
资源: 922

三维生理发音模型：可视化普通话发音训练辅助

粤语+-+普通话互译发音工具

python普通话转粤语发音

c# 配合xml数据可视化

cssmcm(cascade sequence-to-sequence model for chinese mandarin)模型的缺点

想要做一个基于语音识别的英语口语发音矫正项目，可以找什么语音识别模型，用到什么软件？

普通话水平测试朗读作品

huggingface使用模型

用js写一个单词发音的功能

发音词典是什么及作用和原理

语音识别tensorflow模型

最新资源