深度感知手部姿态估计：方法、数据与挑战

需积分: 13 84 浏览量更新于2024-07-17 收藏 6.16MB PDF 举报

"Depth-Based Hand Pose Estimation: Methods, Data, and Challenges" 是一篇发表在国际计算机视觉期刊上的文章，作者包括James Steven. Supancic、Grégory Rogez等人，探讨了基于深度的 hand pose estimation 的技术、可用数据集以及面临的挑战。本文的核心是深度图像在手部姿态估计中的应用。手部姿态估计是一项关键的计算机视觉任务，它涉及识别和理解手部在3D空间中的位置和姿态。这项技术在虚拟现实、手势控制、人机交互等领域有着广泛的应用。文章详细讨论了不同的方法，这些方法通常分为两类：基于模型的方法和基于学习的方法。基于模型的方法依赖于预先构建的手部几何模型，通过匹配深度图像中的特征来估计手部关节的位置。而基于学习的方法，特别是深度学习，利用大量的标注数据训练神经网络来直接预测手部关节坐标，这种方法在近年来取得了显著的进步。数据在深度学习中起着至关重要的作用。文章可能会提到一些公开的数据集，如MSRA-HAND10K、NYU Hand Pose Dataset或FreiHAND等，这些数据集提供了丰富的手部深度图像和相应的关节注释，用于训练和评估算法。同时，数据的多样性、质量和量都是衡量方法性能的关键因素。挑战方面，文章可能涵盖了几个关键点：手部的复杂结构导致姿态估计的难度增加；遮挡和自遮挡情况下的准确识别；实时性能与精度之间的平衡；以及数据不足或标注成本高等问题。此外，手部姿态估计的评估标准和基准测试也是研究的重要部分，以促进技术的持续发展。作者们可能还提到了他们正在进行的相关项目，如“3D人体姿态理解的数据合成”，这可能涉及到扩展现有数据集和改进模拟技术，以更好地模拟真实世界的手部运动。 "Depth-Based Hand Pose Estimation: Methods, Data, and Challenges" 文章深入剖析了基于深度图像的手部姿态估计的各个方面，从算法到数据集，再到实际应用中遇到的困难，对这个领域的研究者和技术开发者来说具有很高的参考价值。

4 James Steven Supanˇciˇc III et al.

(a) (b)

Fig. 3 Our new test data challenges methods with clut-

ter (a), object manipulation (b), low-res (c), and various

viewpoints (d). We collected data in diverse environments

(8 oﬃces, 4 homes, 4 public spaces, 2 vehicles, and 2 out-

doors) using time-of-ﬂight (Intel/Creative Gesture Camera)

and structured-light (ASUS Xtion Pro) depth cameras. Ten

(3 female and 7 male) subjects were given prompts to per-

form natural interactions with objects in the environment, as

well as display 24 random and 24 canonical poses.

This is important, because any practical application

must handle diverse subjects, scenes, and clutter.

3 Training Data

Here we discuss various approaches for generating train-

ing data (ref. Table 2). Real annotated training data

has long been the gold standard for supervised learn-

ing. However, the generally accepted wisdom (for hand

pose estimation) is that the space of poses is too large to

manually annotate. This motivates approaches to lever-

age synthetically generated training data, discussed fur-

ther below.

Real data + manual annotation: Arguably, the space

of hand poses exceeds what can be sampled with real

data. Our experiments identify a second problem: per-

haps surprisingly, human annotators often disagree on

pose annotations. For example, in our testset, human

annotators disagree on 20% of pose annotations (con-

sidering a 20mm threshold) as plotted in Fig. 19. These

disagreements arise from limitations in the raw sensor

data, either due to poor resolution or occlusions. We

found that low resolution consistently corresponds to

annotation ambiguities, across test sets. See Sec. 5.2)

for further discussion and examples. These ambigui-

ties are often mitigated by placing the hand close to

Dataset Generation Viewpoint Views Size Subj.

ICL [59] Real + manual annot. 3rd Pers. 1 331,000 10

NYU [63] Real + auto annot. 3rd Pers. 3 72,757 1

HandNet [69] Real + auto annot. 3rd Pers. 1 12,773 10

UCI-EGO [44] Synthetic Egocentric 1 10,000 1

libhand [67] Synthetic Generic 1 25,000,000 1

Table 2 Training data sets: We broadly categorize train-

ing datasets by the method used to generate the data and

annotations: real data + manual annotations, real data + au-

tomatic annotations, or synthetic data (and automatic anno-

tations). Most existing datasets are viewpoint-speciﬁc (tuned

for 3rd-person or egocentric recognition) and limited in size

to tens of thousands of examples. NYU is unique in that it

is a multiview dataset collected with multiple cameras, while

ICL contains shape variation due to multiple (10) subjects.

To explore the eﬀect of training data, we use the public lib-

hand animation package to generate a massive training set of

25 million examples.

the camera (Xu and Cheng 2013, Tang et al. 2014,

Qian et al. 2014, Oberweger et al. 2016). As an illustra-

tive example, we evaluate the ICL training set (Tang

et al. 2014).

Real data + automatic annotation: Data gloves directly

obtain automatic pose annotations for real data (Xu

and Cheng 2013). However, they require painstaking

per-user calibration. Magnetic markers can partially

alleviate calibration diﬃculties (Wetzler et al. 2015)

but still distort the hand shape that is observed in

the depth map. When evaluating depth-only systems,

colored markers can provide ground-truth through the

RGB channel (Sharp et al. 2015). Alternatively, one

could use a “passive” motion capture system. We eval-

uate the larger NYU training set (Tompson et al. 2014)

that annotates real data by ﬁtting (oﬄine) a skinned 3D

hand model to high-quality 3D measurements. Finally,

integrating model ﬁtting with tracking lets one leverage

a small set of annotated reference frames to annotate

an entire video (Oberweger et al. 2016).

Quasi-synthetic data: Augmenting real data with geo-

metric computer graphics models provides an attractive

solution. For example, one can apply geometric trans-

formations (e.g., rotations) to both real data and its

annotations (Tang et al. 2014). If multiple depth cam-

eras are used to collect real data (that is then registered

to a model), one can synthesize a larger set of varied

viewpoints (Sridhar et al. 2015, Tompson et al. 2014).

Finally, mimicking the noise and artifacts of real data

is often important when using synthetic data. Domain

transfer methods (D. Tang and Kim 2013) learn the re-

lationships between a small real dataset and large syn-

thetic one.

Synthetic data: Another hope is to use data rendered

by a computer graphics system. Graphical synthesis

剩余20页未读，继续阅读

无名小卒000001

粉丝: 112
资源: 39

深度感知手部姿态估计：方法、数据与挑战

depth-based 3d hand pose estimation

人体姿态检测总结，Deep Learning-Based Human Pose Estimation: A Survey

matlab代码qr-relative-pose-estimation:相对姿势估计包

multivew-head-pose-estimation:从多个角度估计头部姿势

3d-human-pose-estimation:有关3D人体姿势估计的重要论文

tf2-mobile-2d-single-pose-estimation:使用TensorFlow 2.0的iOS和Android姿势估计

zekuan-huaman-pose-estimation：张泽宽-2020毕业设计-三维人体姿态采集

matlab代码做游戏-action-recognition-using-pose-estimation:使用姿势估计的ML项目动作识别

SOTA-on-monocular-3D-pose-and-shape-estimation:单眼3D姿态估计的最新方法3D网格恢复

tf2-mobile-pose-estimation：使用TensorFlow 2.0的iOS和android的姿势估计

最新资源