SPL-07512-2009
1
Abstract—Sign language recognition systems suffer from the
problem of signer dependence. In this letter, we propose a novel
method that adapts the original model set to a specific signer with
his/her small amount of training data. First, affinity propagation
is used to extract the exemplars of signer independent hidden
Markov models; then the adaptive training vocabulary can be
automatically formed. Based on the collected sign gestures of the
new vocabulary, the combination of maximum a posteriori and
iterative vector field smoothing is utilized to generate
signer-adapted models. Experimental results on six signers
demonstrate that the proposed method can reduce the amount of
the adaptation data and still can achieve high recognition
performance.
Index Terms—Sign language recognition, signer adaptation,
affinity propagation, maximum a posteriori, vector field
smoothing.
I. I
NTRODUCTION
ign language recognition aims to transcribe sign language
to text automatically. Many works on sign language
recognition have been performed [1]. To the best of our
knowledge, some representative works are [2][3][4]. Most
works focus on signer dependent (SD) sign language
recognition. Nevertheless, the performance of the system is
poor when a signer is unregistered in the training set. Signer
independent (SI) models [5] can achieve high performance, but
still can not be comparable with SD models. Adaptation
techniques in speech recognition [6] and handwriting
recognition [7] supply an alternative solution to this problem.
Ong et al. [8] applied supervised maximum a posteriori (MAP,
[9]) to adapt their system and yielded 88.5% accuracy on a
Manuscript received on March 29, 2009; revised on September 29, 2009.
This work was supported by the Natural Science Foundation of China under
contracts 60533030, 60603023 and 60973067, and by National Key
Technology R&D Program under contract No.2008BAH26B03, and also by
open project of Beijing Multimedia and Intelligent Software Key laboratory in
Beijing University of Technology. The associate editors coordinating the
review of this manuscript and approving it for publication were Prof. Fernando
Perez-Gonzalez and Prof. Jen-Tzung Chien.
Copyright (c) 2008 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes must be
obtained from the IEEE by sending a request to pubspermissions@ieee.org.
Y. Zhou, D. Zhao and H. Yao are with the School of Computer Science and
Technology, Harbin Institute of Technology, Harbin, China; X. Chen is with
Key Lab of Intelligent Information Processing, Institute of Computing
Technology, Chinese Academy of Sciences, Beijing, China; W. Gao is with the
Institute of Digital Media, Peking University, Beijing, China (e-mail: {yzhou,
xlchen, dbzhao, wgao}@jdl.ac.cn; yhx@vilab.hit.edu.cn).
20-gesture vocabulary. U. von Agris et al. [10] combined
maximum likelihood linear regression and MAP for signer
adaptation. With 80 and 160 signs, they achieved 78.6% and
94.6% accuracy respectively on a vocabulary of 153 signs. In
their latest work [11], they combined eigenvoice, maximum
likelihood linear regression, and MAP algorithms to reduce the
adaptation data and retard the performance saturation. Wang et
al. [12] presented an adaptive method based on data generating,
in which they reduced the size of adaptation data set from 350
to 136 with acceptable recognition accuracy.
In this letter we propose a novel signer adaptation method to
reduce the amount of data further. As shown in Fig. 1, our
method mainly consists of two steps: the exemplar extraction
and the combination of maximum a posteriori and iterative
vector field smoothing (MAP/IVFS). First, affinity propagation
(AP, [13]) is used to extract a subset of the vocabulary, which
can represent the major characteristics of the new signer’s
signing; then MAP/IVFS is adopted to modify the parameters
of the models. In the next two sections, AP based exemplar
extraction and MAP/IVFS are described respectively, and the
experiment evaluation and conclusion are given in Section IV
and V respectively.
II. E
XEMPLAR
E
XTRACTION
Different people have different hand sizes, body sizes,
signing habits, signing rhythms, and so on, which leads to
varieties when they sign the same word. The mismatch between
the training data and the test data leads to poor recognition
performance. One alternative to solve this problem is collecting
enough data from different people to train SI models. In this
way two problems stand out:
1) The models are difficult to converge because the data of
different people vary noticeably. Sometimes the
distinctions between the data of two different people on the
same sign are almost larger than the distinctions between
the data of the same people on two different signs.
Adaptive Sign Language Recognition
with Exemplar Extraction and MAP/IVFS
Yu Zhou, Xilin Chen, Member, IEEE, Debin Zhao, Hongxun Yao, and Wen Gao, Fellow, IEEE
S
SI Models SA Models
Exemplar
Extraction
MAP/IVFS
Exemplar
Subset
Adaptation
Data
Fig. 1. Exemplar extraction and MAP/IVFS for signer adaptation