
JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 12, NO.4, DECEMBER 2014
429
Abstract
This paper presents a real-time Kinect-
based hand pose estimation method. Different from
model-based and appearance-based approaches, our
approach retrieves continuous hand motion parameters
in real time. First, the hand region is segmented from
the depth image. Then, some specific feature points on
the hand are located by the random forest classifier, and
the relative displacements of these feature points are
transformed to a rotation invariant feature vector.
Finally, the system retrieves the hand joint parameters
by applying the regression functions on the feature
vectors. Experimental results are compared with the
ground truth dataset obtained by a data glove to show
the effectiveness of our approach. The effects of
different distances and different rotation angles for the
estimation accuracy are also evaluated.
Index Terms
Hand motion, Kinect, parameter
estimation, random forest, regression function.
1. Introduction
Hand tracking has been applied in various human
computer interface (HCI) designs, such as sign language
recognition, augmented reality, and virtual reality. Two
major applications are hand gesture recognition and three
dimensional (3D) hand pose estimation. The former
analyzes the hand shape and location to identify the hand
gesture which can be applied to sign language
understanding. The latter estimates the hand parameters
such as joint angles of each finger and global orientation of
the palm. The 3D hand pose estimation is quite a challenge
due to the lack of sufficient information and self-occlusion.
It may be applied to the virtual reality or the robotic arm
control.
There are two main approaches, appearance-based and
Manuscript received December 15, 2013; revised March 15, 2014. This
work was supported by NSC under Grand No. 101-2221-E-468-030.
C.-M. Chang is with the Department of Applied Informatics and
Multimedia, Asia University, Taichung (Corresponding author e-mail:
cmchang@ asia.edu.tw).
C.-L. Huang is with the Department of Applied Informatics and
Multimedia, Asia University, Taichung (e-mail: huang.chunglin@
gmail.com).
C.-H. Chang is with the Department of Informatics and Multimedia,
Asia University, Taichung (e-mail: bbb00437@ hotmail.com).
Digital Object Identifier: 10.3969/j.issn.1674-862X.2014.04.017
model-based, for the hand pose estimation. The
appearance-based method establishes a large amount of
hand models pre-stored in a database. These hand models
are generated by a synthesized virtual hand or constructed
from a real hand. One of the samples in the database that
best matches the current observation is retrieved. Romero et
al. used local sensitivity hashing (LSH) to search the
nearest neighborhood in a large database that contains over
100000 hand poses with HOG (histogram of oriented
gradient) features for real-time operations
[1]
.Miyamotoet
al. designed a tree structure classifier based on the typical
hand poses and their variations
[2]
.
The model-based method estimates the motion
parameters based on the on-line construction of a 3D
articulated hand model to fit the observations. By rendering
the hand model with different parameters iteratively, the
deviation between the hand model and real observation will
be converged, and the hand parameters can be obtained.
Gorce et al. proposed to minimize the objective function
between the observation and the model using the
quasi-Newton method
[3]
. The objective function is
measured by the difference of the texture and shading
information. Oikonomidis et al. utilized particle swarm
optimization (PSO) to solve the optimization problem of
matching process with a depth image
[4],[5]
. Hamer et al.
created a hand model with 16 separate segments connected
in a pairwise Markov random field, and adjusted the states
of segments by belief propagation based on both RGB (red,
green, and blue) and depth images
[6]
. Some approaches
evaluated the 3D human body joints and hand joints based
on the depth sensors without markers
[7],[8]
. Similar to [8],
we perform the feature transformation in our method to
estimate the hand motion parameters in real time.
This paper proposes a new approach to estimate hand
parameters by analyzing the depth maps and applying
regression functions. First, the hand depth map is
segmented. Then, we apply a pixel-based classification to
categorize each pixel in the hand map into 9 classes which
consist of 5 fingertips, 3 blocks on palm, and one for the
rest part of hand. The classifier is developed by using a
random forest classifier. We exact the feature points from
the depth maps which are converted into feature space for
regression functions. Based on the regression functions, the
hand motion parameters can be obtained.
Our approach does not use gesture recognition or
iterative approximation to develop real-time hand motion
Real-Time Hand Motion Parameter Estimation with
Feature Point Detection Using Kinect
Chun-Ming Chang, Che-Hao Chang, and Chung-Lin Huang