深度图像中的人体姿态实时识别

需积分: 10 35 浏览量更新于2024-09-17 收藏 4.65MB PDF 举报

"这篇论文主要探讨了从单个深度图像中实时识别人体姿势的技术，尤其关注手势识别与跟踪。由Jamie Shotton等人在Microsoft Research Cambridge和Xbox Incubation完成，该方法通过一种新颖的方式快速准确地预测3D关节位置，无需时间信息。" 在【标题】提及的手势识别与跟踪领域，这篇论文提出了一个创新的方法，能够从单个深度图像中实时地估计人体的三维关节位置。这个方法摒弃了传统的基于时间序列的信息，转而采用对象识别的思路，设计出一个中间的身体部位表示层，将复杂的姿势估计问题转化为更简单的像素级分类问题。这一转变简化了问题，使得模型可以对不同的姿势、身体形状、穿着等进行不变性估计。【描述】中提到的“大型且高度多样的训练数据集”是这个方法的关键。它允许分类器在各种条件下估计身体部位，提高了模型的鲁棒性和泛化能力。最后，通过对分类结果进行重投影并寻找局部极值点，生成具有置信度得分的3D关节提议。系统在消费级硬件上可以达到200帧每秒的运行速度，这在实时应用中是非常重要的。【部分内容】中提到了论文的评估结果，显示了在合成和真实测试集上的高精度，并研究了不同训练参数的影响。与相关工作相比，他们实现了最先进的准确性，并证明了方法在泛化性能上的提升。这篇论文对于手势识别与跟踪的研究者来说，提供了有价值的技术和实验结果，有助于进一步推动相关领域的技术进步，尤其是在实时应用和高效算法设计方面。同时，它也强调了大规模训练数据和巧妙的中间表示对于解决复杂视觉问题的重要性。

Real-Time Human Pose Recognition in Parts from Single Depth Images

Jamie Shotton Andrew Fitzgibbon Mat Cook Toby Sharp Mark Finocchio

Richard Moore Alex Kipman Andrew Blake

Microsoft Research Cambridge & Xbox Incubation

Abstract

We propose a new method to quickly and accurately pre-

dict 3D positions of body joints from a single depth image,

using no temporal information. We take an object recog-

nition approach, designing an intermediate body parts rep-

resentation that maps the difﬁcult pose estimation problem

into a simpler per-pixel classiﬁcation problem. Our large

and highly varied training dataset allows the classiﬁer to

estimate body parts invariant to pose, body shape, clothing,

etc. Finally we generate conﬁdence-scored 3D proposals of

several body joints by reprojecting the classiﬁcation result

and ﬁnding local modes.

The system runs at 200 frames per second on consumer

hardware. Our evaluation shows high accuracy on both

synthetic and real test sets, and investigates the effect of sev-

eral training parameters. We achieve state of the art accu-

racy in our comparison with related work and demonstrate

improved generalization over exact whole-skeleton nearest

neighbor matching.

1. Introduction

Robust interactive human body tracking has applica-

tions including gaming, human-computer interaction, secu-

rity, telepresence, and even health-care. The task has re-

cently been greatly simpliﬁed by the introduction of real-

time depth cameras [16, 19, 44, 37, 28, 13]. However, even

the best existing systems still exhibit limitations. In partic-

ular, until the launch of Kinect [21], none ran at interactive

rates on consumer hardware while handling a full range of

human body shapes and sizes undergoing general body mo-

tions. Some systems achieve high speeds by tracking from

frame to frame but struggle to re-initialize quickly and so

are not robust. In this paper, we focus on pose recognition

in parts: detecting from a single depth image a small set of

3D position candidates for each skeletal joint. Our focus on

per-frame initialization and recovery is designed to comple-

ment any appropriate tracking algorithm [7, 39, 16, 42, 13]

that might further incorporate temporal and kinematic co-

herence. The algorithm presented here forms a core com-

ponent of the Kinect gaming platform [21].

Illustrated in Fig. 1 and inspired by recent object recog-

nition work that divides objects into parts (e.g. [12, 43]),

our approach is driven by two key design goals: computa-

tional efﬁciency and robustness. A single input depth image

is segmented into a dense probabilistic body part labeling,

with the parts deﬁned to be spatially localized near skeletal

CVPR Teaser

seq 1: frame 15

seq 2: frame 236

seq 5: take 1, 72

depth image body parts 3D joint proposals

Figure 1. Overview. From an single input depth image, a per-pixel

body part distribution is inferred. (Colors indicate the most likely

part labels at each pixel, and correspond in the joint proposals).

Local modes of this signal are estimated to give high-quality pro-

posals for the 3D locations of body joints, even for multiple users.

joints of interest. Reprojecting the inferred parts into world

space, we localize spatial modes of each part distribution

and thus generate (possibly several) conﬁdence-weighted

proposals for the 3D locations of each skeletal joint.

We treat the segmentation into body parts as a per-pixel

classiﬁcation task (no pairwise terms or CRF have proved

necessary). Evaluating each pixel separately avoids a com-

binatorial search over the different body joints, although

within a single part there are of course still dramatic dif-

ferences in the contextual appearance. For training data,

we generate realistic synthetic depth images of humans of

many shapes and sizes in highly varied poses sampled from

a large motion capture database. We train a deep ran-

domized decision forest classiﬁer which avoids overﬁtting

by using hundreds of thousands of training images. Sim-

ple, discriminative depth comparison image features yield

3D translation invariance while maintaining high computa-

tional efﬁciency. For further speed, the classiﬁer can be run

in parallel on each pixel on a GPU [34]. Finally, spatial

modes of the inferred per-pixel distributions are computed

using mean shift [10] resulting in the 3D joint proposals.

An optimized implementation of our algorithm runs in

under 5ms per frame (200 frames per second) on the Xbox

360 GPU, at least one order of magnitude faster than exist-

ing approaches. It works frame-by-frame across dramati-

cally differing body shapes and sizes, and the learned dis-

criminative approach naturally handles self-occlusions and

下载后可阅读完整内容，剩余7页未读，立即下载

lcj49997

粉丝: 3

深度图像中的人体姿态实时识别

手势识别与跟踪

手势识别论文

计算机视觉下的手势识别与跟踪算法探究

下一代智能设备的3D手势识别与跟踪理论与实现

Virtools中基于Kinect的手势识别与跟踪技术实现

压缩感知驱动的实时手势识别与跟踪算法优化

计算机视觉手势识别：跟踪与识别算法的探索与实现

论文研究-基于时空上下文的手势跟踪与识别.pdf

基于Kinect深度图像信息的手势跟踪与识别

基于穿戴视觉的人手跟踪与手势识别方法 (2005年)

最新资源