Cross Linguistic Comparison of Mandarin and English EMA Articulatory Data
Sheng Li
1,2
, Lan Wang
1,2
1
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
2
The Chinese University of Hong Kong, Hong Kong, China
{sheng.li, lan.wang}@siat.ac.cn
Abstract
This paper aims at effectively identifying common English
mispronunciations by Mandarin speakers and incorporating this
knowledge into computer assisted language learning (CALL) to
improve the learner’s accented English. For this purpose, English
and Mandarin multi-channel EMA articulatory datasets collected
from native English and native Mandarin speakers respectively
have been used to uncover cross-linguistic distinctions. The
Procrustes based speaker normalization is used to eliminate the
variability which comes from speaker-specific vocal-tract
anatomies and other individual biomechanical properties. Then
the English phonemes missing from Mandarin and their
Mandarin confusing equivalents are identified using
phonological knowledge. These English and Mandarin phoneme
pairs may be hard to distinguish in acoustics, but by extracting
useful information from the changing on tongue positions and
shapes of the lips while speaking can be good cross linguistic
phoneme level comparison metrics both empirical and quantified.
With this method, the same analysis can be done between
languages, or different accents within the same language in the
future.
Index Terms: mispronunciation, articulatory data, cross
linguistic comparison
1. Introduction
Recent development of computer assisted pronunciation training
(CAPT) has benefit a lot from current automatic speech
recognition (ASR) and speech visualization techniques. It also
gets direct instruction from the explorations on speech
production and perception. These linguistic researches have been
no longer relying on auditory analysis, but also on measuring the
activities of the articulators (the tongue, the larynx, the lips and
the jaw) during speech. Many devices, such as X-ray microbeam
cinematography, Cine-MRI, ultrasound, electropalatography and
electromagnetic articulography (EMA) are used for this purpose.
According to the theory of language transfer [1], it is
assumed that the learner's mother tongue may negatively affect
his learning a foreign language. Such effects were observed
when analyzing the mispronunciations made by Chinese learners.
We find that the English phonemes which are missing from
Mandarin may most easily be mispronounced or even replaced
by Mandarin phonemes.
The objective of our research in this paper is to find those
English phonemes, which may most probably lead to confusions
and mispronunciations by Mandarin speakers, and their
equivalents in Mandarin, so that we can incorporate this
knowledge into the CALL system.
For this purpose, we had collected English and Mandarin
multi-channel EMA articulatory data collected from native
English and native Mandarin speakers respectively. These
datasets provide us a chance to uncover cross-linguistic
distinctions. But the major challenge here is how to overcome
the variability that comes from speaker-specific vocal-tract
anatomies and other individual biomechanical properties.
Cu
rrent cross linguistic articulatory study can be found in
the experimental phonetics researches. In [2], German and
Hungarian tongue shape comparisons of articulatory profiles
were carried out on both static and kinematic tongue
configurations. The work of [3] summarized the techniques for
speaker normalization derived from Procrustes methods [4]
could be effectively applied to both acoustic and articulatory data.
The other studies related to articulatory speaker
normalization are the series researches about Audio–Visual
Speech Processing (AVSP). These researches concern more
about constructing a speaker-independent statistical model
(GMM, HMM and etc.) like in [5], coped with speaker
adaptation techniques. These methods have been well developed
in speech recognition, but required a large scale of multi-speaker
datasets.
For quantified comparison of the ariticulatory data, the
method of projecting the phonemes onto a universal articulatory
space was investigated in [6], which used multi-dimensional
scaling (MDS) algorithm [7]. Alternatively, the research in [8]
also introduced Hierarchical Clustering Analysis (HCA) [9] to
generate the classes of the equivalent phonemes.
The methodology we choose is as follows: we use the
Procrustes based speaker normalization just considering our
experimental condition of limited data. Then we identify the
English phonemes missing from Mandarin and their equivalent
in Mandarin and give empirical comparison. For quantified
comparison, we visualize distances of all the phonemes from two
languages onto a quantitative and cross-linguistic phonetic space
by multi-dimensional scaling (MDS) analysis. Hierarchical
Clustering Analysis (HCA) also has been used to cluster the
similar phonemes from two different languages.
The rest of this paper is organized as follows: Section 2
introduces the collection and data processing of EMA data.
Section 3 describes what we do to normalize the speaker
difference between the two language data. In Section 4, the
normalized data is used to construct a cross linguistic and
speaker independent articulatory space, so that mispronunciation
confusions can be observed directly. The conclusions and future
work are in Section 6.