深度相机驱动的3D手部姿态估计综述：最新进展与关键技术

需积分: 13 196 浏览量更新于2024-07-14 收藏 2.83MB PDF 举报

本文是一篇综述性质的文章，标题为《3D手部姿态估计：相机、方法与数据集》（A Survey on 3D Hand Pose Estimation: Cameras, Methods, and Datasets），发表在2019年的《模式识别》(PatternRecognition)期刊上。随着2010年消费级深度相机的出现，3D手部姿态估计领域受到了越来越多的关注。尽管近年来取得了显著的进步，但缺乏对最新发展进行全面的概述。因此，作者Rui Li和Jianrong Tan作为浙江大学CAD&CG国家重点实验室的研究人员，撰写了这篇论文。文章旨在填补这一空白，系统地审视了3D手部姿态估计领域的关键元素。首先，作者提出了一种无标记方法来评估深度相机在辅助数值控制直线运动引导下的跟踪精度。传统的手部姿态估计方法主要侧重于静态特征，而这篇文章则涵盖了更全面的内容。深度相机是3D手部姿态估计的重要工具，它们通过捕捉深度信息来确定手部在三维空间中的精确位置和方向。文章详细讨论了不同类型的深度相机技术，包括结构光相机（如Time-of-Flight或主动红外）、RGB-D相机（结合颜色图像和深度信息）以及基于红外或激光的深度传感器。每种技术的优势和局限性都得到了深入分析。手部姿态估计的方法论部分，作者涵盖了多种算法和技术，包括但不限于：基于模板匹配的方法，利用深度图的几何特性；深度学习方法，特别是卷积神经网络（CNN）在立体匹配和关键点检测中的应用；以及利用深度光流和运动模型的实时跟踪策略。这些方法通过优化目标函数，如关节角度预测、欧氏距离误差或者更复杂的能量函数来实现手部姿态的精确估计。此外，论文还关注了公共基准数据集的重要性，列举了一些广泛使用的3D手部姿态估计数据集，如MS COCO的手部关键点数据集、MPII Hands、HandNet和FingerNet等。这些数据集对于评价算法性能、比较不同方法的准确性和鲁棒性至关重要，也为后续研究提供了标准化的评估平台。这篇文章提供了一个详尽的指南，帮助读者理解3D手部姿态估计的当前状态和发展趋势，包括所用的硬件设备、技术策略以及评估标准。这对于研究人员、工程师以及希望在人机交互领域利用手部追踪技术的开发者来说，具有很高的参考价值。

R. Li, Z. Liu and J. Tan / Pattern Recognition 93 (2019) 251–272 255

Table 1

Popular commercial depth cameras.

Camera Model Release date Discontinued Depth technology Range Max depth Fps

Microsoft Kinect 1st generation 2010 Yes Structured light 0.5–4.5 m 30

2nd generation 2014 Yes ToF 0.5–4.5 m 30

ASUS Xtion PRO LIVE 2012 Yes Structured light 0.8–3.5 m 60

2 2017 Yes Structured light 0.8–3.5m 30

Leap Motion (updated on December 20, 2018) 2013 No Dual IR stereo vision 0.03–0.6

m 200

Intel RealSense F200 2014 Yes Structured light 0.2–1.2 m 60

R200 2015 No Structured light 0.5–3.5 m 60

LR200 2016 Yes Structured light 0.5–3.5 m 60

SR300 2016 No Structured light 0.3–2 m 30

ZR300 2017 Yes Structured light 0.5–3.5 m 60

D415 2018 No Structured light 0.16–10

m 90

D435 2018 No Structured light 0.11–10 m 90

SoftKinetic DS311 2011 Yes ToF 0.15–4.5 m 60

DS325 2012 Yes ToF 0.15–1 m 60

DS525 2013 Yes ToF 0.15–1 m 60

DS536A 2015 Yes ToF 0.1–5 m 60

DS541A 2016 Yes ToF 0.1–5m 60

Creative Interactive Gesture 2012 Yes ToF

0.15–1 m 60

Structure Sensor (updated on July 24, 2018) 2013 No Structured light 0.4–3.5 m 60

map that encodes the difference in horizontal coordinates of the

corresponding image points. The values in the disparity map are

inversely proportional to the scene depth at the corresponding

pixel location. Due to the sensitivity to illumination and texture,

this type of depth camera is not popular in hand pose estimation.

It is diﬃcult to say which type of camera works best for

hand pose estimation, because the performance is also inﬂuenced

by environmental factors and application scenarios. Sridhar et al.

[56] validated the effectiveness of their method with the Creative

Interactive Gesture, Intel RealSense, and Primesense Carmine. In

[57] , Sridhar et al. published a benchmark dataset with the Cre-

ative Interactive Gesture and Kinect v1. Melax et al. [58] and Su-

pancic et al. [30] used the ASUS Xtion and Creative Interactive Ges-

ture.

There is no strict universal rule for selecting the most appro-

priate camera. The selection mainly depends on the nature of the

problem. From Table 1 , we can see that the Intel RealSense se-

ries are suitable for mid-range and long-range applications, the

Leap Motion is suitable for short-range applications, and the Struc-

ture Sensor is suitable for mobile applications. Cameras like the

Microsoft Kinect, ASUS Xtion, SoftKinetic, and Creative Interactive

Gesture have been discontinued. From the point of view of long-

term maintenance and update, these cameras are less attractive

compared to the other depth cameras.

2.2. Existing evaluation approaches

There has been extensive work on evaluating the performance

of a depth camera in medical ﬁelds. Harkel et al. [59] tested the

accuracy of the RealSense in a cohort of patients with a unilateral

facial palsy. House et al. [60] evaluated the RealSense for image-

guided interventions and applications in vertebral level localiza-

tion. Yeung et al. [61] evaluated the performance of the Kinect v1

when it was used as a clinical assessment tool for total body cen-

ter of mass sway measurement. Noonan et al. [62] evaluated the

Kinect v1 for motion tracking of a head phantom with a head CT.

Ferche et al. [63] utilized the Leap Motion and RealSense to assist

the rehabilitation of patients with a disability in upper limbs by

providing them with augmented feedback presented in a dedicated

virtual environment.

Numerous evaluation approaches can be found in other ﬁelds.

Cree et al. [64] analyzed the precision of the SoftKinetic for range

imaging. Jakus et al. [65] assessed the consistency and accuracy of

the Leap Motion. Fankhauser et al. [66] analyzed the depth data

quality of the Kinect v2 for mobile robot navigation in overcast

conditions with direct sunlight. Carfagni et al. [67] studied the

metrological and critical characterization of the RealSense when it

was used as a 3D scanner. Yang et al. [68] obtained an accuracy

distribution of the Kinect v2 through a cone model. Lachat et al.

[69] provided an assessment and calibration method of the Kinect

v2 toward a potential use of close-range 3D modeling. Corti et al.

[70] presented a metrological characterization of the Kinect v2 by

taking into account measuring conditions and environmental pa-

rameters. Breuer et al. [71] provided an analysis of measurement

noise, accuracy, and error sources of the Kinect v2.

Some researchers focus on a comparison of different depth

cameras. Zennaro et al. [72] compared the performance of the

Kinect v1 and Kinect v2 in order to explain the results achieved

by switching the depth sensing technology. Gonzalez-Jorge et al.

[73] presented an accuracy and precision test of the Kinect v1 and

Kinect v2 using a standard artifact based on ﬁve spheres and seven

cubes. Wasenmuller et al. [74] investigated the accuracy and pre-

cision of the Kinect v1 and Kinect v2 in the context of 3D recon-

struction, SLAM, and visual odometry. Boehm et al. [75] studied

structured light cameras with respect to their repeatability and ac-

curacy. Langmann [76] presented a depth camera assessment, in-

cluding the Kinect v1, ZESS MultiCam, PMDTec 3k-S, SoftKinetic,

and PMDTec CamCube 41k.

Comparisons between depth cameras and other devices can also

be found. Lima et al. [77] used the RealSense as an eye gaze tracker

to estimate a user’s gaze location, and compared it with a special-

ized device, the Tobii EyeX. Seixas et al. [78] designed an experi-

ment to study the performance of the Leap Motion in 2D pointing

tasks and compared the Leap Motion to a mouse and touchpad.

The experimental results indicated that the Leap Motion worked

poorly.

To obtain the accuracy of a depth camera, one must know

the ground-truth results that serve as a reference. Various high-

precision measuring devices have been introduced in the exist-

ing approaches, e.g., the Vicon motion capture system [61] , AGPtek

Handheld Digital Laser Point Distance Meter [68] , tape measure

[66] , Polaris optical tracker [62] , FARO Focus terrestrial laser scan-

ner [69] , coordinate measurement machine [67,73] , clinical 3dMD

system [59] , NextEngine scanner [72] , and Qualisys motion capture

system [65,79] .

The emphasis of the existing approaches is to evaluate the

performance of a camera measuring ﬁxed spatial positions. Such

evaluations essentially characterize the static property. In contrast,

剩余21页未读，继续阅读

Jason_____Wang

粉丝: 92

深度相机驱动的3D手部姿态估计综述：最新进展与关键技术

depth-based 3d hand pose estimation

Depth-Based Hand Pose Estimation: Methods, Data, and Challenges

3D Human Pose Estimation with 2D Marginal Heatmaps.pdf

Hand Pose Estimation A Survey.pdf

Single Person Pose Estimation A Survey.pdf

Vision-based Vehicle Speed Estimation for ITS A Survey.pdf

Regional Multi-person Pose Estimation_poseestimation_源码.zip

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields.docx

Learning a Deep Network with Spherical Part Model for 3D Hand Pose Estimation

Carbon storage in China’s forest ecosystems estimation by different integrative methods.pdf

最新资源