DOF spherical joint overcomes the restrictions [41,42].
Another solution is to have a twist around the bone axis
as a linear function of abduction and flexion angles [43].
The angular DOF of fing ers, which is often called the local
configuration, and the six DOF of a frame attached to the
wrist, which is often called the global confi guration, form
a configuration vector representing the pose of the hand.
A 27 DOF model that was introduced in [44] and has
been used in many studies is shown in Fig. 2b. The CMC
joints are assumed to be fixed, which quite unrealistically
models the palm as a rigid body. The fingers are modeled
as planar serial kinematic chains attached to the palm at
anchor points located at MCP joints. The planarity
assumption does not hold in general. Standard robotics
techniques provide efficient representations and fast algo-
rithms for various calculations related to the kinematics
or dynamics of the model. Adding an extra twist motion
to MCP joints [45,46], introducing one flexion/extension
DOF to CMC joints [47] or using a spherical joint for
TM [42] are some examples of the varia tions of the kine-
matic model.
The kinematic hand model described above is the most
natural choice for parameterizing the 3D hand state but
there exist a few exceptions using other types of representa-
tions. Sudderth et al. [48] used independent rigid bodies for
each component of the hand, leading to a highly redundant
model. The kinematic relations between these rigid bodies
were enforced using a prior model in their belief propaga-
tion network. Heap et al. [49] dropped the kinematic model
and modeled the entire surface of the hand using PCA
applied on MRI data. Such a representation requires fur-
ther processing to extract useful higher-level information,
such as pointing direction; however, it was shown to be
very effective to reliably locate and track the hand in
images.
Full DOF hand pose estimation systems extensively rely
on a-priori information on the motion and shape of the
hand; therefore, the kinematic model is augmented with
shape information to generate appearances of the hand in
arbitrary configurations, and hand pose or motion con-
straints to reduce the search space for pose estimation.
Although the same motion models could be assumed for
arbitrary users, the same assumption cannot hold true for
shape models. If precision is a requirement for the applica-
tion, these models need to go through a calibration proce-
dure to estimate user-specific measurements.
3.2. Modeling natural hand motion
Although active motion of the hand (i.e., motion with-
out external forces) is highly constrained, this is not
reflected in the kinematic model. An attempt to capture
natural hand motion constraints is by complementing the
kinematic model with static constraints that reflect the
range of each parameter and dynamic constraints that
reflect the joint angle dependencies. Based on the studies
in biomechanics, certain closed-form constraints can be
derived [44,42,19]. An important constraint is the relation
h
DIP
¼
2
3
h
PIP
between the PIP and DIP angles that helps
decrease the dimension of the problem by 4. There exist
many other constraints that are more complex to be uti-
lized in a pose estimation algorithm. For example, the flex-
ion angle of an MCP joint has an effect on the abduction
capability of that joint and neighboring MCP joints.
The very intricate structure of the hand does not allow
expressing all the constraints in a closed form. Moreover,
the natural motion of the hand may follow more subtle
constraints which have nothing to do with structural limi-
tations [50]. These problems have motivated learning-based
approaches, which use ground truth data collected using
data gloves. The feasible configurations of the hand are
expected to lie on a lower dimensional manifold due to bio-
mechanics constraints. Lin et al. [50] applied PCA on a
large amount of joint angle data to construct a seven-
dimensional space. The data was approximated in the
reduced dimens ional space as the union of linear mani-
folds. It is also possible to use the data directly without
any further modeling as in [51] to guide the search in the
configuration space. Another way to use the glove data is
to generate synthetic hand images to build a template data-
base that models the appearance of the hand under all pos-
sible poses [52–55].
In addition to modeling the feasible hand configura-
tions, learning the dynamics of hand motion can help
tracking algorithms. Zhou et al. [56] presented an EDA
(eigen-dynamic analysis) method for modeling the non-lin-
ear hand dynamics. First, PCA was used to reduce the
dimension of the problem. Then hand motion was modeled
in the reduced space, while moving only one of the fingers,
using low order linear systems. The resulting five linear
models were combined to obtain a high order stochastic
linear dynamic system for arbitrary finger motion.
Thayananthan et al. [57] represented the configuration
space as a tree, which was constructed using hierarchical
clustering techniques or regular partitioning of the eigen-
space at multiple resolutions. Each node of the tree corre-
sponds to a cluster of natural hand configurations collected
using a data-glove. The tree structure enables fast hierar-
chical search through Bayesian Filtering. The dynamic
model of the system, which is assumed to be a first order
Markov process, was built by histogramming state transi-
tions between clusters using large amount of training data.
3.3. Modeling the shape of the hand
Hand shape has both articulated and elastic compo-
nents; however, computational efficiency reasons do not
allow the use of very complex shape models for pose esti-
mation. In many studies, the hand model needs to be pro-
jected many times on the input image(s) to obtain features
that can be matched against the observed features. Visibil-
ity calculations to handle occlusions add extra complexity
to the projection calculations. These problems have moti-
vated the use of rough shape models, composed of simple
56 A. Erol et al. / Computer Vision and Image Understanding 108 (2007) 52–73