优化稀疏图回归：2019年动作识别中骨架的高效表达

需积分: 0 141 浏览量更新于2024-08-05 收藏 1.01MB PDF 举报

2019年的研究论文《优化基于骨骼的行动识别通过稀疏化图回归》(Optimized Skeleton-based Action Recognition via Sparsified Graph Regression)由Xiang Gao、Wei Hu、Jiaxiang Tang、Jiaying Liu和Zongming Guo等人在清华大学计算机科学与技术学院提出。随着深度传感器的普及，动态人体骨骼作为动作识别的一种强大模态引起了广泛关注。传统的研究方法主要依赖于循环神经网络(RNN)或卷积神经网络(CNN)，但它们在处理不规则骨骼关节时的表达能力有限。然而，图卷积网络(GCN)被提出以应对不规则结构数据，但关键的图构建仍然是一个挑战。该论文创新性地将骨骼自然地表示为图，并提出了一种基于图回归的图卷积网络（GR-GCN），旨在捕捉数据中的空间-时间变化。作者认识到图表示对图卷积至关重要，因此首先提出了一种图回归方法，通过统计学习从多观察数据中挖掘潜在的图结构。这种方法特别关注空间-时间特征的学习，通过稀疏化策略来优化模型，减少冗余信息，提高模型的效率和准确性。 GR-GCN的设计考虑了骨骼之间的隐式关联，通过学习节点间的连接权重，有效地捕捉了动作序列中各关节之间的复杂关系。这种稀疏化的图构建允许模型更好地聚焦于动作的关键部分，避免了过拟合，提升了对各种动作的泛化能力。同时，通过将时间和空间维度整合到图卷积过程中，论文提出的方法能够更好地理解和解析动作序列中的动态变化。这篇论文为骨骼动作识别提供了一个新颖且优化的框架，它不仅解决了图结构建模的问题，还展示了如何通过统计学习和稀疏化策略提升模型的性能。这一研究对于利用骨骼数据进行动作识别的实际应用具有重要意义，特别是在智能监控、运动分析和人机交互等领域。

Optimized Skeleton-based Action Recognition via

Sparsiﬁed Graph Regression

Xiang Gao, Wei Hu, Jiaxiang Tang, Jiaying Liu, Zongming Guo

Institute of Computer Science and Technology, Peking University, China

{gyshgx868, forhuwei, hawkey1999, liujiaying, guozongming}@pku.edu.cn

Abstract—With the prevalence of accessible depth sensors,

dynamic human body skeletons have attracted much attention as

a robust modality for action recognition. Previous methods model

skeletons based on RNN or CNN, which has limited expressive

power for irregular skeleton joints. While graph convolutional

networks (GCN) have been proposed to address irregular graph-

structured data, the fundamental graph construction remains

challenging. In this paper, we represent skeletons naturally on

graphs, and propose a graph regression based GCN (GR-GCN)

for skeleton-based action recognition, aiming to capture the

spatio-temporal variation in the data. As the graph representation

is crucial to graph convolution, we ﬁrst propose graph regres-

sion to statistically learn the underlying graph from multiple

observations. In particular, we provide spatio-temporal modeling

of skeletons and pose an optimization problem on the graph

structure over consecutive frames, which enforces the sparsity of

the underlying graph for efﬁcient representation. The optimized

graph not only connects each joint to its neighboring joints in

the same frame strongly or weakly, but also links with relevant

joints in the previous and subsequent frames. We then feed

the optimized graph into the GCN along with the coordinates

of the skeleton sequence for feature learning, where we deploy

high-order and fast Chebyshev approximation of spectral graph

convolution. Further, we provide analysis of the variation charac-

terization by the Chebyshev approximation. Experimental results

validate the effectiveness of the proposed graph regression and

show that the proposed GR-GCN achieves the state-of-the-art

performance on the widely used NTU RGB+D, UT-Kinect and

SYSU 3D datasets.

Index Terms—Graph regression, graph convolutional net-

works, spatio-temporal graph modeling, skeleton-based action

recognition

I. INTRODUCTION

Action recognition is an active research direction in com-

puter vision, with widespread applications in video surveil-

lance, human computer interaction, robot vision, autonomous

driving and so on. Among the multiple modalities [1]–[5] that

are able to recognize human action, such as appearance, depth

and body skeletons [6], [7], the skeleton-based sequences

are springing up in recent years, due to the prevalence of

affordable depth sensors (e.g., Kinect) and effective pose esti-

mation algorithms [8]. Skeletons convey compact 3D position

information of the major body joints, which are robust to

variations of viewpoints, body scales and motion speeds [9].

Hence, skeleton-based action recognition has attracted more

and more attention [10]–[16].

Different from modalities deﬁned on regular grids such as

images or videos, dynamic human skeletons are non-Euclidean

geometric data, which consist of a series of human joint coor-

dinates. This poses challenges in capturing both the intra-frame

Sparsified Spatio-Temporal Graph

Graph Regression (GR)

Input Skeleton Sequence

GCN

Output Classification Score

Hand

Waving

Fig. 1. The pipeline of the proposed GR-GCN for skeleton-based action

recognition. Given a sequence of human body joints, we ﬁrst learn a common

sparsiﬁed spatio-temporal graph over each frame, its previous frame and the

subsequent one via graph regression. This leads to a spatio-temporal graph

with strong and physical edges (black solid lines), strong and non-physical

edges (red dashed lines) and weak edges (green dashed ones) for variation

modeling. We then feed the sparsiﬁed spatio-temporal graph into a graph

convolutional network (GCN) along with the 3D coordinates of joints for

variation learning, which leads to the output classiﬁcation scores.

features and temporal dependencies. Recent methods learn

these features via deep models like recurrent neural networks

(RNN) [6], [7], [17]–[23] and convolutional neural networks

(CNN) [21], [24]–[27]. Nevertheless, the topology in skeletons

is not fully exploited in the grid-shaped representation of RNN

and CNN.

A natural way to represent skeletons is graph, where each

joint is treated as a vertex in the graph, and the relationship

among the joints is interpreted by edges with weights. As

unordered graphs cannot be fed into RNN or CNN directly,

graph convolutional networks (GCN) have been proposed to

deal with data deﬁned on irregular graphs for a variety of

applications [28]–[31]. Yan et al. [32] and Li et al. [33] are

the ﬁrst to propose graph-based skeleton representation, which

is then fed into the GCN to automatically learn the spatial

and temporal patterns from data. Tang et al. [34] propose a

arXiv:1811.12013v2 [cs.CV] 15 Apr 2019

下载后可阅读完整内容，剩余9页未读，立即下载

woo静

粉丝: 29
资源: 347

优化稀疏图回归：2019年动作识别中骨架的高效表达

historian-optimized.js

HANA-optimized InfoCube

PSO-optimized-SVR_PSO-optimized-SVR_SVM_python_PSO-SVR_PSO

高通wlan框架学习(16)-- optimized connectivity experience 优化连接体验

optimized variable- weighted least-squares support vector machine based on p

什么是SDT、DCT、COCT和CHCT？分别用于描述什么？

modelsim仿真时vopt top_module -o optimized_top_module

Pipeline Speed / Durability 的取值都是什么意思

minGw64编译Qt时遇到too many sections问题

rtsp-simple-server

最新资源