基于骨骼数据的深度学习手部手势识别方法

版权申诉

195 浏览量更新于2024-09-11 收藏 280KB PDF 举报

"这篇论文《Deep Learning for Hand Gesture Recognition on Skeletal Data》由Guillaume Devineau等人撰写，探讨了一种基于深度学习的3D手部手势识别新方法。研究中，他们提出了一种卷积神经网络（CNN），该网络能够处理连续的手部骨骼关节位置数据，无需依赖深度图像。实验结果显示，该模型在DHG（SHREC2017 3D形状检索竞赛）挑战数据集上的表现优于其他已发表的方法，分别在14种和28种手势分类任务中达到了91.28%和84.35%的分类准确性。" 本文主要关注的是利用深度学习技术对手势识别进行优化，特别是在没有深度图像的情况下，仅依赖于骨骼数据。手势识别在人机交互、虚拟现实、辅助技术等领域具有广泛的应用潜力。传统的手势识别方法可能依赖于复杂的特征工程和模板匹配，而深度学习则提供了一种自动学习特征并进行分类的有效途径。作者提出的新CNN架构设计独特，能够处理时间序列的骨骼关节信息。这种并行卷积的设计允许模型同时考虑不同时间步的手势动态，从而捕获到手势的时空特征。卷积神经网络在图像处理和序列数据建模方面表现出色，因此在处理骨架数据时，它们能够有效地提取关键的手势模式。实验部分，研究人员使用了SHREC2017的DHG数据集进行评估。这个数据集包含复杂的手势序列，为模型提供了多样性和挑战性的测试环境。与其他方法的比较显示，提出的模型在识别精度上取得了领先，这表明其对不同手势的区分能力强大，能够在实际应用中提供可靠的手势识别服务。此外，论文还可能涵盖了训练策略、损失函数选择、模型优化以及对数据预处理的需求等方面的内容。这些细节对于理解模型如何学习和泛化，以及如何在有限的数据集上达到高精度，至关重要。通过深入研究这些方面，读者可以获取关于如何构建和优化类似深度学习模型以处理骨骼数据进行手势识别的宝贵经验。这篇论文为深度学习在3D手势识别领域的应用提供了新的视角和实践方法。

Deep Learning for Hand Gesture Recognition on Skeletal Data

Guillaume Devineau

and Wang Xi

and Fabien Moutarde

and Jie Yang

MINES ParisTech, PSL Research University, Center for Robotics, 60 Bd St Michel 75006 Paris, France

Shanghai Jiao Tong University, School of Electronic Information and Electrical Engineering, Shangai, China

Abstract— In this paper, we introduce a new 3D hand gesture

recognition approach based on a deep learning model.

We introduce a new Convolutional Neural Network (CNN)

where sequences of hand-skeletal joints’ positions are processed

by parallel convolutions; we then investigate the performance

of this model on hand gesture sequence classiﬁcation tasks. Our

model only uses hand-skeletal data and no depth image.

Experimental results show that our approach achieves a

state-of-the-art performance on a challenging dataset (DHG

dataset from the SHREC 2017 3D Shape Retrieval Contest),

when compared to other published approaches. Our model

achieves a 91.28% classiﬁcation accuracy for the 14 gesture

classes case and an 84.35% classiﬁcation accuracy for the 28

gesture classes case.

I. INTRODUCTION

Touch and gesture are two natural ways for a user to

interact with one’s environment. While touch necessarily

involves a physical contact (e.g. to write a message on

a phone, to grab a physical object, or to swipe touch-

sensitive textiles), gestures allow remote interactions (e.g.

to interact with a smart screen, or with virtual-reality and

augmented-reality objects). As such, gesture-based human-

computer interfaces can ease the use of digital computing

[27] in situations where it would previously have been difﬁ-

cult or even impossible because of practical constraints like

interacting with everyday life physical objects (e.g. lights,

mirrors, doorknobs, notebooks, ...) or like using computers

in settings where the person has to focus entirely on a task

(e.g. while driving a car, cooking or doing surgery).

Gesture can convey semantic meaning, as well as con-

textual information such as personality, emotion or attitude.

For instance, research shows that speech and gesture share

the same communication system [2] and that one’s gestures

are directly linked to one’s memory [18]. Among gestures,

hand gestures distinguish themselves from two other types of

gestures [25]: body gestures and head gestures. We chose to

work on hand gestures since they can carry more information

more easily than the two other types of gestures. One

preferred way to infer the intent of a gesture is to use a

taxonomy of gestures and to classify the unknown gesture

into one of the existing categories based of the gesture data,

in a similar way to what is done in computer vision for

instance. The classiﬁcation can either be obtained in realtime

at each time step or at the end of the gesture, depending on

the the processing power and the application needs.

In this paper we propose a convolutional neural network

architecture relying on intra- and inter- parallel processing

Tip

Articulation (a)

Articulation (b)

Base

Palm

Wrist

Fig. 1. Hand skeleton returned by the Intel RealSense camera. Each dot

represents one of the n = 22 joints of the skeleton.

of sequences of positions (of hand-skeletal joints) to classify

complete hand gestures. Where most existing deep learn-

ing approaches to gesture recognition use RGB-D image

sequences to classify gestures [49], our neural network only

uses hand (3D) skeletal data sequences which are quicker to

process than image sequences.

The rest of this paper is structured as follows. We ﬁrst

review common recognition methods in Section II. We then

present the DHG dataset we used to evaluate our network in

Section III. We detail our approach in Section IV in terms

of motivations, architecture and results. Finally, we conclude

in Section VI and discuss how our model can be improved

and integrated into a realtime interactive system.

II. DEFINITION & RELATED WORK

We deﬁne a 3D skeletal data sequence s as a vector

s = (p

· · · p

)

whose components p

are multivariate time sequences.

Each component p

= (p

(t))

t∈R

represents a multivariate

sequence with three (univariate sequences) components

= (x

(i)

)

that alltogether represent a time sequence of the positions

(t) of the i-th skeletal joint j

. Every skeletal joint j

represents a distinct and precise articulation or part of one’s

hand in the physical world. An illustration of a 3D hand

skeleton is proposed in ﬁgure 1.

In the following subsections, we present a short review of

some approaches to gesture recognition. Typical approaches

to hand gesture recognition begin with the extraction of

spatial and temporal features from raw data. The features

are later classiﬁed by a Machine Learning algorithm. The978-1-5386-2335-0/18/$31.00

2018 IEEE

下载后可阅读完整内容，剩余7页未读，立即下载

Fun_He

粉丝: 18
资源: 104

基于骨骼数据的深度学习手部手势识别方法

ggg.rar_gesture_gesture recognition_hand_hand gesture matlab

clientserver.rar_gesture_gesture recognition_hand_hand gesture_h

Robust hand gesture recognition for robotic hand control

EMG Signal for gesture recognition

mediapipe简单手势判断

基于opencv的手势识别

file_path = "D:\\gesture_data\\00\\dark\\circle1\\depth.npy

mediapipa+opencv实现双人猜拳游戏

找到D:\gesture_data\00\dark\circle1\radar.npy的上一级目录

segwaywarrior / gesture_recognition_opencv_yolov5

最新资源