TS-TCN：骨架驱动的人体动作识别技术

需积分: 27 176 浏览量更新于2024-07-09 2 收藏 1.93MB PDF 举报

"TS-TCN基于骨架的人体动作识别算法是一种使用两流时间卷积网络进行骨架基础的人体动作识别的深度学习方法。该技术在计算机视觉和人机交互领域有着广泛的应用前景。" TS-TCN（Two-Stream Temporal Convolutional Networks）是一种专门用于骨架数据的人体动作识别算法，它结合了空间和时间信息，以提高识别的准确性。该方法的核心是利用骨架数据，这些数据包含了人体关键关节的位置和结构信息，是无须依赖复杂视觉特征的有效表示。论文中提到，随着体感交互设备的普及，人体动作识别在各种应用中越来越受到关注。骨架数据为基础的动作识别因其能够准确地捕获人体运动的关键信息而变得尤为有效。TS-TCN通过两个独立的流来处理这些信息：一个侧重于空间信息，另一个则关注时间序列的变化。这种分离处理有助于模型更好地理解和解析动作的动态过程。在TS-TCN中，时间卷积层被用于捕捉动作的时间演化模式。时间卷积允许网络学习不同时间步长的过滤器，从而适应不同长度和节奏的动作。同时，空间信息通过卷积操作来提取，考虑了人体关节之间的相对位置和距离，这对于识别复杂的交互动作至关重要。此外，Jia等人在论文中还强调了模型的优化和训练策略。他们可能采用了数据增强技术，如随机旋转、缩放和平移骨架数据，以增加模型的泛化能力。损失函数可能采用了多任务学习的方法，同时优化动作分类和关键点定位，以提高整体性能。论文还提到了实验部分，其中可能包括了在多个公开数据集上的性能评估，如NTU RGB+D或Kinetics-Skeleton等。这些基准测试验证了TS-TCN在人体动作识别任务中的优越性，并与其他现有方法进行了对比，展示了其在准确率和效率方面的优势。 TS-TCN是一种创新的人体动作识别技术，它通过时间卷积网络对骨架数据进行高效处理，提高了动作识别的精度和鲁棒性，为智能交互系统和监控等领域提供了强大的工具。

Jin-Gong Jia et al.: Two-Stream TCNs for Skeleton-Based Human Action Recognition 539

the lack of depth channel in the source images/videos.

With the innovation of 3D data acquisition technology,

RGB-D data has become popular in recent years, which

makes it possible to infer the motion sequence of a skele-

tal joint in the 3D space. For example, Sho ll et al.

[9]

proposed an algorithm for obtaining human skeletons in

real time with a depth sensor. Wang et al.

[10]

also pro-

posed an eﬃcient and robust human pose estimation

algorithm on RGB videos. Signiﬁcant advances have

been made in human action recognition based on RGB

and RGB-D data

[11, 12]

. With the increasing availa-

bility of skeleton acquisitio n tools, research on human

action recognition using skeleton data has generated

growing interest.

In this paper, we simultaneously consider the spatial

and temporal changes of the human skeleton and pro-

pose a more powerful learning model to capture skele-

ton variability in both spatial and temporal dimensions.

Most existing methods lack the ability to extract the

spatiotemporal feature representations. In such meth-

ods, it is often diﬃcult to extract a single feature rep-

resentation that can be used to recog niz e all action

classes. Desig ning a model with a greater learning abi-

lity for spatiotemporal feature representations is a lso

a key problem in human action recognition. Previous

methods for identifying human actions are mainly based

on convolutional neural networks (CNNs)

[13–15]

, recur-

rent neural networks (RNNs)

[16–19]

, or graph convolu-

tional networks (GCNs)

[20–23]

. Typically, these meth-

ods only consider a single feature r epresentation of the

human body. In recent years, the temporal convolu-

tional networks (TCNs)

[24, 25]

have shown outstanding

ability in processing time sequence data, and exten-

sive experiments have shown that TCNs are superior

to RNNs such as Long Short Term Memory networks

(LSTMs). Based on TCNs, des igning a multi-channel

network model that le arns multiple feature represen-

tations simultaneously can improve the accuracy of hu-

man action recognition. We consider two important fea-

ture representations in the new network, i.e., the move-

ments of each skeletal joint between two adjacent ac-

tion frames and the relative positions of the constituent

joints in a single skeletal frame. The main contributions

of our work include the followings.

• We propose a novel method that leverages both

the inter-frame vector feature representation between

adjacent frames and the intra-frame vector feature rep-

resentation within a single frame. Experiments show

that these two vector feature representations play the

role of mutual promotion in recognition of many action

classes.

• We redesig n residual blocks for TCNs and pro-

pose the two-stream temp oral convolutional networks

(TS-TCNs) that can integrate multiple feature repre-

sentations to bring notable improvement in recognition

performance.

• We perform a comprehe nsive experimental val-

idation using four widely well-known datasets: NTU

RGB+D

[11]

, NTU RGB+D 120

[26]

, Northwestern-

UCLA

[27]

, and UTKinect-Action

[28]

. Our results s how

the proposed two-stream network achieves superior per -

formance compar ed with most previous methods.

2 Related Work

In this section, we review relevant literature on hu-

man action recognition. First, we present methods for

extracting the dynamics feature represe ntation of hu-

man actions. We then describe network-based models

to process skeleton sequences for human action recog-

nition.

2.1 Dynamics Representation

The human action recognition task consists in iden-

tifying human body behaviors from sequence data such

as images, videos, and skeletons. The ma in contents

of action behaviors include gestures, actions in da ily

life, interaction and group activities. Early research on

human action recognition focused on still images and

videos

[5, 12]

. RGB data is rich in color, shape, and

texture fea tures. Initial methods for action recogni-

tion mainly use the color and texture information in

2D images. However, va rious factors, such as back-

ground clutter and human body occlusion, make this

identiﬁcation task co mplicated. Liu et al.

[29]

proposed

a method based on deep learning that uses depth se-

quences and the corresponding skeleton joint informa-

tion. Since depth images lack information such as color

and texture, related work based on depth maps is lim-

ited. Wang et al.

[30]

proposed a method using RGB and

depth features to coordinate training for action recog-

nition. Skeleton da ta, which has obvious advantages

over RGB and depth data, contains 3 D informa tion

on the joint points of the human body and thus pro-

vides higher-level geometric features. Wang et al.

[31]

developed an action ensemble model that characterizes

the conjunctive structure of 3D human actions by cap-

turing the correlations of the joints. Zhang et al.

[32]

introduced a related geometric feature on joints and

剩余14页未读，继续阅读

JenKinJia

粉丝: 107
资源: 49

TS-TCN：骨架驱动的人体动作识别技术

Matlab实现SAO-TCN-Multihead-Attention优化算法研究

基于Matlab的SMA-TCN-Multihead-Attention优化算法研究

Matlab实现TSA-TCN-Multihead-Attention优化算法研究

SCI一区 - POA-TCN-BiGRU-Attention鹈鹕算法优化时间卷积双向门控循环单元注意力机制

【高创新】基于矮猫鼬优化算法DMOA-TCN-Attention的用负荷预测算法研究Matlab实现.rar

【高创新】基于海洋捕食者优化算法MPA-TCN-Attention的用负荷预测算法研究Matlab实现.rar

Matlab实现AOA-TCN-Multihead-Attention优化算法进行回归预测

基于GOA-TCN-Multihead-Attention的Matlab预测算法研究

Matlab实现ABC-TCN-LSTM-Multihead-Attention负荷预测优化算法

Matlab实现基于GWO-TCN-LSTM-Multihead-Attention算法的负荷预测

最新资源