A Robust Hand Gesture Recognition Method Via Convolutional Neural Network
Xing Yingxin
Beijing Key Laboratory of Multimedia and Intelligent
Software Technology
College of Metropolitan Transportation
Beijing University of Technology
Beijing, China
xingyingxin@emails.bjut.edu.cn
Li Jinghua, Wang Lichun, Kong Dehui
Beijing Key Laboratory of Multimedia and Intelligent
Software Technology
College of Metropolitan Transportation
Beijing University of Technology
Beijing, China
lijinghua, wanglc, kdh@bjut.edu.cn
Abstract—Hand gesture plays an important role in nonverbal
communication and natural human-computer interaction.
However, the complex hand gesture structure and various
environment factors lead to low recognition rate. For instance,
hand gesture depends on individuals, and different individuals’
hands are with different sizes and postures, in addition,
unconstrained environmental illumination also influences hand
gesture recognition performance. Therefore, hand gesture
recognition is still a challenging issue. This paper proposes a
robust method for hand gesture recognition based on
convolutional neural network, which is utilized to
automatically extract the spatial and semantic feature of hand
gesture. Our method consists of a modified Convolutional
Neural Network structure and data preprocessing, which
corporately increase hand gesture recognition performance.
The experimental results on both Cambridge Hand Gesture
Dataset and self-constructed dataset show that the proposed
method is effective and competitive.
Keywords- Hand Gesture Recognition, Convolutional Neural
Network (CNN) , Canny Edge Detection
I.
I
NTRODUCTION
Today, digital home and intelligent home are making our
life better, of which, natural human machine interaction is
one of core technologies. Different from the traditional
popular keyboard and mouse interaction, hand gesture plays
the most natural and important role in current nonverbal
communication and intelligent interaction. However, hand
gesture recognition still faces challenge on account of its
complexity and variation [1]. As is known to us, different
persons sign a same hand gesture differently, and even the
same person signs a hand gesture differently each time. In
addition, the vision-based hand gesture recognition is also
susceptible to lighting, view and so on [2] [3].
The previous vision-based hand gesture recognition
approaches usually consist of two main steps. The first one is
to extract features, and the second one is to design a
classifier. Among them, robust and effective feature
representation is a major problem. The preceding hand-
crafted feature usually demands the user to have some priori
knowledge and some preprocessing such as image
transformation, segmentation and so on. The recent popular
deep learning method CNN has demonstrated competitive
performance in image representation and classification [4].
The success of CNNs partly lies in its invariance to
translation, rotation and scale, which is also due to its ability
to learn high level semantics. This paper utilizes CNN to
extract robust hand gesture feature, and our focus is the
structure and parameter setting of CNN model for static hand
gesture feature representation. The feature extracted by the
proposal method is easy to compute and able to describe the
hand gesture more excellently. Especially, in order to
enhance the hand gesture representation performance based
on CNN, canny edge detection is introduced beforehand to
remove variable illuminations inherent in the original hand
gesture data. The final experiment results and comparisons
demonstrate the effectiveness of data preprocessing
removing illuminations cooperating with the learned features
via CNN for hand gesture recognition.
The main contributions of this paper are: (1) the
proposed CNN structure and parameter are more suitable for
hand gesture spatial and semantic representation and
discriminative hand gesture understanding; (2) the
preprocessed edge data as CNN model input greatly improve
the robustness of hand gesture recognition with various
illuminations; (3) the learning-based feature representation
approach outperforms existing predefined methods on both
Cambridge Hand Gesture Dataset and self-constructed
dataset.
This paper is organized as follows. Sect. II reviews
related works. Sect. III presents the novel hand gesture
recognition method via CNN. The experimental results and
analysis based on the proposed method are shown in Sect. IV.
The last section summarizes this study and proposes the
future work.
II. R
ELATED
W
ORKS
With the development of natural human machine
interaction, research on hand gesture recognition is an active
field. A lot of the early works in hand gesture recognition
focused on designing hand-crafted features based on the
prior knowledge. Chen et al. [5] utilized Fourier descriptor
(FD) to extract spatial hand shape, and hand region must be
correctly segmented first. Auephanwiriyakul et al. [6] used
Scale Invariant Feature Transform (SIFT) to describe each
test frame. However, instead of directly comparing with the
training frame, this method constructed a signature library
database in advance, so as to match the test frame with
2016 6th International Conference on Digital Home
978-1-5090-4400-9/16 $31.00 © 2016 IEEE
DOI 10.1109/ICDH.2016.20
64
2016 6th International Conference on Digital Home
978-1-5090-4400-9/16 $31.00 © 2016 IEEE
DOI 10.1109/ICDH.2016.20
64