Visualization of Mandarin Articulation by using a
Physiological Articulatory Model
Dian Huang
*
, Xiyu Wu
†
, Jianguo Wei
*
, Hongcui Wang
*
, Chan Song
*
, Qingzhi Hou
*
, and Jianwu Dang
*†
*
Tianjin key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, China
E-mail: {huang_dian@163.com, jianguo.fr@gmail.com, laurelwind@gmail.com,
songchan_8855@126.com, darcy.hou@gmail.com} Tel: +86-15822883934
†
Japan Advanced Institute of Science and Technology, Ishikawa, Japan
E-mail: {xiyuwu@jaist.ac.jp, jdang@jaist.ac.jp}
Abstract—It is difficult for language learners to produce
unfamiliar speech sounds accurately because they may not
manipulate articulatory movements precisely by auditory
feedback alone. Visual feedback can help identify the errors and
promote the learning progress, especially in language learning
and speech rehabilitation fields. In this paper, we propose a
visualization method for Mandarin phoneme pronunciation
using a three-dimensional (3D) articulatory physiological model
driven by Chinese Electromagnetic Articulographic (EMA) data.
A mapping from EMA data to physiological articulatory model
was constructed using three points on the mid-sagittal plane of
the tongue. To do so, we analyzed configurations of 30 Chinese
phonemes based on an EMA database. At the same time, we
designed nearly 150,000 muscle activation patterns and applied
them to the physiological model to generate model-based
articulatory movements. As the result, we developed a visualized
articulation system with 2.5 dimensional and 3D views
respectively. The mapping was evaluated using MRI data. It is
found that the mean deviation was about 0.21cm for seven
vowels.
I. INTRODUCTION
The studies of pronunciation learning have shown that
detailed and accurate error feedback is effective in correcting
the errors addressed in the learning process [1] and visualized
feedback is playing an important rule. Learners will evaluate
their learning through auditory feedback if there is no other
feedback available. However, even if the learner can
recognize the discrepancy between their utterance and the
target speech sounds, it is difficult for them to adjust their
articulations. In language learning process, explicit guidance
is more effective than implicit introduction [2]. A Computer
Assisted Language Learning (CALL) system which contains
visual feedback makes the learners easier to correct their
articulations by providing a visualized articulatory target. In
this study, we put forward a Mandarin phoneme
pronunciation visualization method by using Chinese EMA
data to drive the 3D articulatory physiological model.
With the development of speech analysis and observation
technology, the observation and presentation of pronunciation
visualization is becoming much easier. On the one hand, some
researchers used two-dimensional (2D) model to visualize
articulation. For example, Kaburagi and Honda proposed a 2D
model to predict articulator movements for continuous speech
based on EMA data [3]. A 2D visual-speech synthesizer was
presented to animate the human articulators by Wong et al [4].
LaRocca, et al. presented a system which used articulatory
information in the form of a side-view of a transparent head to
detect spoken segmental errors and provide corrected
feedback so that the learner could see articulator placement
[5]. As proposed by Eskenazi et al [6], a mid-sagittal 2D
model was employed to present immediate corrected
articulatory help for each type of possible phonetic or
prosodic error made by the students. The accuracy of these
methods may be guaranteed but they are not easy to
understand. On the other hand, 3D model is used to present
articulators movements. For example, Computer Graph (CG)
technology is used to construct a 3D model for online Chinese
learning [7]. If learners' pronunciation is incorrect, the system
will demonstrate the correct articulator organs‟ movement, as
well as wrong pronunciations made by learners themselves [8].
The 3D method above is easy to observe and understand but
how to guarantee its accuracy is the biggest challenge. In the
previous studies, one can see that in the field of articulation
visualization, especially in Mandarin, higher accuracy and
intuitive representation cannot be combined well in a
visualization method.
In this paper, we constructed a 3D visualization system for
Chinese phoneme via jointing the advantage of EMA data‟s
high temporal resolution and 3D physiological model‟s high
space resolution, which was expected to provide intuitive and
accurate visualization of articulator movements flexibly. Our
visualization system consists of two modules. One is the 3D
articulatory model which is extracted from a physiological
model, and the other is 2D Chinese EMA data. We applied
three points on the tongue in EMA data to select the best-
matched mid-sagittal shape of 3D model so that we can get
the 3D visualized articulatory organ‟s movements for each
Chinese phoneme. To evaluate the accuracy of our method,
we compared the seven vowels‟ best-matched model data
with Chinese MRI data. The result showed that the accuracy
of our model is acceptable.
The following paper is organized as below: PartⅡand Ⅲ
describe the details of the construction of Chinese EMA
database and 3D model based database. Part Ⅳ introduces the
method for building up the mapping, and partⅤshows the