基于骨骼的行动识别：分层递归神经网络方法

154 浏览量更新于2024-08-26 收藏 429KB PDF 举报

"分层递归神经网络用于基于骨骼的动作识别" 在计算机视觉和人工智能领域，动作识别是一项重要的任务，它涉及分析视频或传感器数据来理解人类的行为。传统的动作识别方法通常依赖于手工设计的特征来捕获人体骨架的空间结构和时间动态，并利用精心设计的分类器进行识别。然而，这些方法的性能往往受限于特征提取的复杂性和准确性。这篇研究论文提出了一个新的方法，即分层递归神经网络（Hierarchical Recurrent Neural Network, HRNN）用于基于骨骼的数据进行动作识别。递归神经网络（RNN）因其在处理序列数据时能够有效建模长期上下文信息而被广泛采用。在HRNN中，作者们考虑了人体物理结构，将骨架分解为五个部分，分别为头部、上肢、下肢、躯干以及手部。这五个部分分别作为五个子网络的输入。每个子网络负责学习其对应部分的运动模式，随着网络层数的增加，子网络提取的表示被层次化地融合，形成更高层的输入。这种分层融合策略有助于捕捉不同层次的动作细节，从局部关节运动到全身协同运动的复杂模式。此外，HRNN通过端到端的训练，可以自动学习到最具代表性的特征，从而减少了对人工特征工程的依赖。 RNN的一个主要挑战是梯度消失或爆炸问题，这在处理长序列时尤为明显。为了解决这个问题，HRNN可能采用了门控机制，如长短时记忆网络（LSTM）或门控循环单元（GRU），这些机制可以有效地控制信息流并缓解梯度问题。通过这种方式，HRNN能够在保持模型复杂性适中的同时，更好地捕捉动作序列的时间依赖性。实验部分，论文作者可能会对比HRNN与其他现有方法，如传统方法、卷积神经网络（CNN）和非分层RNN在各种数据集上的性能，以证明其优势。此外，他们可能还探讨了模型的泛化能力、参数效率和计算效率等方面。这篇研究论文提出的分层递归神经网络为基于骨骼的动作识别提供了一个新的视角，通过层次化地建模人体不同部位的运动，增强了模型对动作序列的理解和识别能力。这种方法不仅提高了动作识别的准确率，还展示了深度学习在解决复杂序列任务中的潜力。

Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition

Yong Du, Wei Wang, Liang Wang

Center for Research on Intelligent Perception and Computing, CRIPAC

Nat’l Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences

{yong.du, wangwei, wangliang}@nlpr.ia.ac.cn

Abstract

Human actions can be represented by the trajectories of

skeleton joints. Traditional methods generally model the

spatial structure and temporal dynamics of human skeleton

with hand-crafted features and recognize human actions by

well-designed classiﬁers. In this paper, considering that re-

current neural network (RNN) can model the long-term con-

textual information of temporal sequences well, we propose

an end-to-end hierarchical RNN for skeleton based action

recognition. Instead of taking the whole skeleton as the in-

put, we divide the human skeleton into ﬁve parts accord-

ing to human physical structure, and then separately feed

them to ﬁve subnets. As the number of layers increases, the

representations extracted by the subnets are hierarchically

fused to be the inputs of higher layers. The ﬁnal represen-

tations of the skeleton sequences are fed into a single-layer

perceptron, and the temporally accumulated output of the

perceptron is the ﬁnal decision. We compare with ﬁve other

deep RNN architectures derived from our model to verify

the effectiveness of the proposed network, and also com-

pare with several other methods on three publicly available

datasets. Experimental results demonstrate that our model

achieves the state-of-the-art performance with high compu-

tational efﬁciency.

1. Introduction

As an important branch of computer vision, action recog-

nition has a wide range of applications, e.g., intelligent

video surveillance, robot vision, human-computer interac-

tion, game control, and so on [15, 36]. Traditional studies

about action recognition mainly focus on recognizing ac-

tions from videos recorded by 2D cameras. But actually,

human actions are generally represented and recognized in

the 3D space. Human body can be regarded as an articu-

lated system including rigid bones and hinged joints which

are further combined into four limbs and a trunk [31]. Hu-

man actions are composed of the motions of these limbs

and trunk which are represented by the movements of hu-

BR NN

Layer1 Layer2 Layer3 Layer4 Layer5 Layer6 Layer7

Fully Connected Layer

Softmax Layer

Layer8 Layer9

Figure 1: An illustrative sketch of the proposed hierarchi-

cal recurrent neural network. The whole skeleton is divided

into ﬁve parts, which are fed into ﬁve bidirectional recur-

rent neural networks (BRNNs). As the number of layers

increases, the representations extracted by the subnets are

hierarchically fused to be the inputs of higher layers. A

fully connected layer and a softmax layer are performed on

the ﬁnal representation to classify the actions.

man skeleton joints in the 3D space [37]. Currently, reliable

joint coordinates can be obtained from the cost-effective

depth sensor using the real-time skeleton estimation algo-

rithms [27, 28]. Effective approaches should be investigated

for skeleton based action recognition.

Human skeleton based action recognition is generally

considered as a time series problem [5, 17], in which the

characteristics of body postures and their dynamics over

time are extracted to represent a human action. Most of

the existing skeleton based action recognition methods ex-

plicitly model the temporal dynamics of skeleton joints by

using Temporal Pyramids (TPs) [19, 31, 33] and Hidden

Markov Models (HMMs) [20, 34, 35]. The TPs methods

are generally restricted by the width of the time windows

and can only utilize limited contextual information. As for

HMMs, it is very difﬁcult to obtain the temporal aligned se-

quences and the corresponding emission distributions. Re-

cently, recurrent neural networks (RNNs) with Long-Short

Term Memory (LSTM) [8, 10] neurons have been used for

action recognition [1, 11, 16]. All this work just uses sin-

gle layer RNN as a sequence classiﬁer without part-based

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38565818

粉丝: 3
资源: 956

基于骨骼的行动识别：分层递归神经网络方法

基于深度卷积-递归神经网络的手绘草图识别方法.pdf

我们的ACMRecSys2017论文“使用分层递归神经网络个性化基于会话的推荐”的代码_Python_Shell_下载.zip

两流递归神经网络用于大规模连续手势识别

Python-基于深度卷积递归神经网络结构的手写字识别系统Tensorflow

crnn：卷积递归神经网络（CRNN），用于基于图像的序列识别

基于递归神经网络的端到端语音识别.pdf

个性化会话推荐的分层递归神经网络代码发布

使用分层递归神经网络进行语音带宽扩展技术研究

ApplicationsandEnhancementsoftheDeepRecurrentNeuralNetworkForSpeechSeparation:深度递归神经网络用于单声道信号源分离和说话人识别

action-recognition-visual-attention, 基于软注意的深层递归神经网络动作识别.zip

最新资源