视觉手部识别技术进展

需积分: 4 114 浏览量更新于2024-08-02 收藏 1.33MB PDF 举报

"这篇文档是关于基于视觉的手部识别技术的研究综述，主要探讨了视觉手部姿态估计的现状和潜力，以及它在人机交互中的应用。文章由Ali Erola等人撰写，他们分别来自内华达大学计算机视觉实验室和NASA艾姆斯研究中心的生物视觉实验室。该文于2005年9月接收，2006年10月接受，并于2007年1月19日在线发布，由Mathias Kolsch推荐。" 基于视觉的手部识别是一种利用计算机视觉技术来识别和理解人类手部形态和动作的方法。这种方法具有重要的应用价值，尤其是在提供自然的人机交互（HCI）方面。传统的手套式感应技术虽然能满足高级输入需求，但存在诸多限制，如穿戴不便，可能影响用户与电脑环境的自然交互，且需要长时间校准和设置。随着计算机视觉技术的发展，研究人员开始探索非接触式的解决方案，以实现更自然的手部输入。文献中提到了两种主要的研究方向：一是手势分类，这涉及到识别特定的手势并将其转化为指令；二是手部姿态估计，即精确地确定手部各个关节的位置和朝向。这些技术结合深度学习、图像处理和模式识别等方法，可以实时地追踪和理解手的运动。手势分类研究通常包括训练模型来识别预定义的一组手势，这些手势可以代表特定的命令或操作。例如，通过机器学习算法，系统能够识别出“点赞”、“点击”等手势，并将这些手势转化为相应的计算机指令。这一领域的挑战在于如何准确、快速地识别复杂的手势，同时减少误识别率。另一方面，手部姿态估计则更为复杂，因为手部有多个可自由移动的关节，这要求系统能精确跟踪每个关节的三维位置。这需要高精度的图像分析和复杂的建模技术，如关键点检测、深度图像处理和三维重建。随着深度神经网络的进步，如卷积神经网络（CNN）和递归神经网络（RNN），在手部姿态估计的准确性上取得了显著提升。在人机交互领域，基于视觉的手部识别技术有广泛的应用前景，包括虚拟现实（VR）、增强现实（AR）、机器人控制、游戏控制以及无障碍技术等。例如，在VR环境中，用户可以通过自然的手部动作来操作虚拟对象，提高沉浸感。在无障碍技术中，对于行动不便的人来说，无需物理接触的交互方式可以提供更大的便利性。然而，当前基于视觉的手部识别仍面临诸多挑战，如光照变化、遮挡、手部自遮挡、实时性能和鲁棒性等。解决这些问题需要持续的科研创新和技术优化。随着硬件性能的提升和算法的不断进步，基于视觉的手部识别技术有望在未来发挥更大的作用，进一步推动人机交互的自然性和有效性。

DOF spherical joint overcomes the restrictions [41,42].

Another solution is to have a twist around the bone axis

as a linear function of abduction and ﬂexion angles [43].

The angular DOF of ﬁng ers, which is often called the local

conﬁguration, and the six DOF of a frame attached to the

wrist, which is often called the global conﬁ guration, form

a conﬁguration vector representing the pose of the hand.

A 27 DOF model that was introduced in [44] and has

been used in many studies is shown in Fig. 2b. The CMC

joints are assumed to be ﬁxed, which quite unrealistically

models the palm as a rigid body. The ﬁngers are modeled

as planar serial kinematic chains attached to the palm at

anchor points located at MCP joints. The planarity

assumption does not hold in general. Standard robotics

techniques provide eﬃcient representations and fast algo-

rithms for various calculations related to the kinematics

or dynamics of the model. Adding an extra twist motion

to MCP joints [45,46], introducing one ﬂexion/extension

DOF to CMC joints [47] or using a spherical joint for

TM [42] are some examples of the varia tions of the kine-

matic model.

The kinematic hand model described above is the most

natural choice for parameterizing the 3D hand state but

there exist a few exceptions using other types of representa-

tions. Sudderth et al. [48] used independent rigid bodies for

each component of the hand, leading to a highly redundant

model. The kinematic relations between these rigid bodies

were enforced using a prior model in their belief propaga-

tion network. Heap et al. [49] dropped the kinematic model

and modeled the entire surface of the hand using PCA

applied on MRI data. Such a representation requires fur-

ther processing to extract useful higher-level information,

such as pointing direction; however, it was shown to be

very eﬀective to reliably locate and track the hand in

images.

Full DOF hand pose estimation systems extensively rely

on a-priori information on the motion and shape of the

hand; therefore, the kinematic model is augmented with

shape information to generate appearances of the hand in

arbitrary conﬁgurations, and hand pose or motion con-

straints to reduce the search space for pose estimation.

Although the same motion models could be assumed for

arbitrary users, the same assumption cannot hold true for

shape models. If precision is a requirement for the applica-

tion, these models need to go through a calibration proce-

dure to estimate user-speciﬁc measurements.

3.2. Modeling natural hand motion

Although active motion of the hand (i.e., motion with-

out external forces) is highly constrained, this is not

reﬂected in the kinematic model. An attempt to capture

natural hand motion constraints is by complementing the

kinematic model with static constraints that reﬂect the

range of each parameter and dynamic constraints that

reﬂect the joint angle dependencies. Based on the studies

in biomechanics, certain closed-form constraints can be

derived [44,42,19]. An important constraint is the relation

DIP

PIP

between the PIP and DIP angles that helps

decrease the dimension of the problem by 4. There exist

many other constraints that are more complex to be uti-

lized in a pose estimation algorithm. For example, the ﬂex-

ion angle of an MCP joint has an eﬀect on the abduction

capability of that joint and neighboring MCP joints.

The very intricate structure of the hand does not allow

expressing all the constraints in a closed form. Moreover,

the natural motion of the hand may follow more subtle

constraints which have nothing to do with structural limi-

tations [50]. These problems have motivated learning-based

approaches, which use ground truth data collected using

data gloves. The feasible conﬁgurations of the hand are

expected to lie on a lower dimensional manifold due to bio-

mechanics constraints. Lin et al. [50] applied PCA on a

large amount of joint angle data to construct a seven-

dimensional space. The data was approximated in the

reduced dimens ional space as the union of linear mani-

folds. It is also possible to use the data directly without

any further modeling as in [51] to guide the search in the

conﬁguration space. Another way to use the glove data is

to generate synthetic hand images to build a template data-

base that models the appearance of the hand under all pos-

sible poses [52–55].

In addition to modeling the feasible hand conﬁgura-

tions, learning the dynamics of hand motion can help

tracking algorithms. Zhou et al. [56] presented an EDA

(eigen-dynamic analysis) method for modeling the non-lin-

ear hand dynamics. First, PCA was used to reduce the

dimension of the problem. Then hand motion was modeled

in the reduced space, while moving only one of the ﬁngers,

using low order linear systems. The resulting ﬁve linear

models were combined to obtain a high order stochastic

linear dynamic system for arbitrary ﬁnger motion.

Thayananthan et al. [57] represented the conﬁguration

space as a tree, which was constructed using hierarchical

clustering techniques or regular partitioning of the eigen-

space at multiple resolutions. Each node of the tree corre-

sponds to a cluster of natural hand conﬁgurations collected

using a data-glove. The tree structure enables fast hierar-

chical search through Bayesian Filtering. The dynamic

model of the system, which is assumed to be a ﬁrst order

Markov process, was built by histogramming state transi-

tions between clusters using large amount of training data.

3.3. Modeling the shape of the hand

Hand shape has both articulated and elastic compo-

nents; however, computational eﬃciency reasons do not

allow the use of very complex shape models for pose esti-

mation. In many studies, the hand model needs to be pro-

jected many times on the input image(s) to obtain features

that can be matched against the observed features. Visibil-

ity calculations to handle occlusions add extra complexity

to the projection calculations. These problems have moti-

vated the use of rough shape models, composed of simple

56 A. Erol et al. / Computer Vision and Image Understanding 108 (2007) 52–73

剩余21页未读，继续阅读

sundeepblue

粉丝: 0
资源: 9

视觉手部识别技术进展

一种基于视觉的手势识别系统

基于视觉手势识别的研究

基于视觉的手势识别法及实现

An Application of Classifier Combination Methods in Hand Gesture Recognition

Thinning methodologies-a comprehensive survey

Creating Immersive Experiences: An In-depth Guide to OpenCV Augmented Reality Technologies, from ...

【Advanced】Using MATLAB to Implement Long Short-Term Memory (LSTM) Networks for Classification and ...

基于计算机视觉的虚拟手交互技术

【java毕业设计】网页时装购物系统源码（springboot+vue+mysql+说明文档+LW）.zip

Kylin10 + GDAL2.4 + OSG3.6.4 + OsgEarth2.10.1

最新资源