978-1-4673-0174-9/12/$31.00 ©2012 IEEE 449 ICALIP2012
Adapted Language Modeling for Recognition of Retelling Story in Language
Learning
Meng Chen, Yang Song, Lan Wang
Shenzhen Institutes of Advanced Technology,Chinese Academy of Sciences/The Chinese
University of Hong Kong
{chenmeng, yang.song, lan.wang}@siat.ac.cn
Abstract
N-gram language modeling typically requires large
quantities of in-domain training data, i.e., data that
matches the task in both topic and style. For the task of
retelling stories, obtaining large volumes of speech
transcriptions is often unrealistic. In this paper, we
propose a novel method of language modeling using
mixture models with very limited text datain the task of
retelling stories. We modeled topic-specific, spoken-
style, and document-style language models separately
and interpolated them. We also interpolated the class-
based language model with the N-gram models.
Experimental results show that up to 61.6% reduction
of perplexity and 20.7% reduction of word error rate
(WER) have been obtained by our best performing
model.
1. Introduction
With the development of computer technology,
Computer AssistedLanguage Learning (CALL) system
has offered great advantages over traditional language
learning methods. Retelling stories has been presented
to the language learner to evaluate his/her oral
proficiency. Automatic scoring based on Automatic
Speech Recognition (ASR)to evaluate the speaking
ability in the task of retelling stories has been studied
recently. In the task of retelling, students listen to a
monologue of story (200~300 words) spoken by a
native speaker, and then retell the story with their own
words. The audios of students are non-native
spontaneous speech with specific spoken style, which
is not only different with the original story but also
contains lexical and syntactic errors.
For spontaneous speech recognition, researchers
have made numerous efforts to increase the ASR
accuracy by employing a variety of improved language
modeling techniques.In the study of [1], the authors
constructed the language model for spontaneous
speech by combination of written text from textbooks
and transcripts of conversational telephone speech of
Switchboard and Fisher corpora. Another work of [2]
presented a method of generating simulated spoken-
style text by randomly inserting fillers into written-
style text. However, this approach handles only fillers,
and doesn’t consider features like repeat and self-repair.
G. Moore and S. Young [3] used class-based language
models for robust estimation of N-gram probabilities
with limited or unmatched data. Akita and Kawahara
[4] proposed the other approach using a probabilistic
transformation model trained from a parallel aligned
corpus of the faithful transcripts and their written-style
texts. However, it is quite difficult to obtain such
aligned corpus. All the above efforts mainly focused on
spontaneous speech of native speakers, few researchers
have explored the language modelingfor the task of
non-native spontaneous speech recognition.
Although these language model improvement
techniques are undoubtedly helpful, they either need
large amounts of closely matched data, or can only
cover limited features of spontaneous speech style. For
a task of retelling stories, the students are required to
repeat the story based on what they heard, and they
would organize the sentences with their own words
when they can’t remember the exact words used by the
native speakers. Therefore,the speech of students is
non-native spontaneous speech with three specific
features. Firstly, the speech is closely related to the
original story in topic, but not restricted with the
vocabulary of original story. Secondly, there are lots of
disfluencies, such as filled pauses, hesitation,
repeatedwords and self-repaired words. Thirdly, it
contains various lexical and syntactic errors since the
speakers are non-native and their oral abilities are far
from that of native speakers. Due to thesespecific
features of retelling speech, transcripts of telephone
conversations or newswire text are obviously not
suitable.
In this paper we proposed an effective method to