3D人体动作识别：多速度STIPs在手部与手势识别中的应用

需积分: 10 27 浏览量更新于2024-09-09 收藏 233KB PDF 举报

"基于多速度STIP的人类行为识别" 本文主要探讨了在3D数字捕捉设备迅速发展的背景下，如何利用先进的计算机视觉技术进行精确的人类行为识别，特别是对手部动作和手势的识别。随着大数据时代的到来，信息处理的需求日益增长，传统的行为分析方法已无法满足对精细动作识别的要求。作者提出了一种3D人类活动识别的完整框架，旨在提升手部和手势行为分析的准确性。首先，文章介绍了改进的图割方法用于手部分割和跟踪。图割法是一种优化算法，常用于图像分割，通过最小化能量函数来确定像素的类别。在此基础上，对图割算法进行了优化，使其能更准确地分离出手部区域并追踪其运动轨迹。接着，为了更好地表征人类活动的区分性特征，作者提出了3D Mesh MoSIFT特征描述符。Mesh MoSIFT（Mesh Motion SIFT）是结合了3D几何特性和人类行为先验信息的特征提取方法。SIFT（尺度不变特征变换）是经典的图像特征检测和描述算法，具有旋转、尺度和亮度不变性。通过将SIFT与3D网格（Mesh）相结合，并考虑动作的动态信息，这种特征描述符能有效地捕获手部和手势的运动模式和形状变化。然后，文章采用模拟正交匹配追逐（Simulation Orthogonal Matching Pursuit, SOMP）编码视觉词典。正交匹配追逐是稀疏编码的一种算法，用于从大量基中找出一个较小的子集，这些基可以有效地表示输入信号。在行为识别中，SOMP可以用于从高维特征空间中提取最具代表性的特征向量，从而降低计算复杂度，提高识别效率。实验部分，作者使用了一个RGB-D视频数据集和ChaLearn手势数据集进行了验证。这两个数据集提供了丰富的3D信息，包括彩色图像和深度信息，适合于3D行为识别的研究。实验结果表明，提出的3D人类活动识别框架相比于传统方法显著提高了识别精度，验证了其在大数据环境下的有效性和实用性。关键词：大数据；3D数字捕捉；手部行为识别；3D Mesh MoSIFT；模拟正交匹配追逐

Human Activity Recognition based on 3D Mesh MoSIFT Feature Descriptor

Yue Ming

School of Electronic Engineering

Beijing University of Posts and Telecommunications

Beijing 100876, P.R. China

Email: myname35875235@126.com

Abstract—The times of Big Data promotes increasingly high-

er demands for information processing. The rapid development

of 3D digital capturing devices prompts the traditional behavior

analysis towards ﬁne motion recognition, such as hands and

gesture. In this paper, a complete framework of 3D human

activity recognition is presented for the behavior analysis of

hands and gesture. First, the improved graph cuts method is

introduced to hand segmentation and tracking. Then, combined

with 3D geometric characteristics and human behavior prior

information, 3D Mesh MoSIFT feature descriptor is proposed

to represent the discriminant property of human activity.

Simulation orthogonal matching pursuit (SOMP) is used to

encode the visual codewords. Experiments, based on a RGB-

D video dataset and ChaLearn gesture dataset, show the

improved accuracy of human activity recognition.

Keywords-Big Data; 3D digital capturing devices; 3D human

activity recognition; hand segmentation and tracking; 3D Mesh

MoSIFT feature descriptor;

I. INTRODUCTION

Big data technologies describe new architectures for

intelligent information processing. In the recent years,

the growing interest inspired by human activity analysis

prompts the scholars to pay more attention for algorithm

studying. Pavan Turage [1] provided a survey on real-

time video analysis. Joshua Candamo [2] focused on

how to understand Transit Scenes and reviewed the

relative algorithms for human behavior analysis in the

corresponding scenes. Technical progress and the rapid

decline in the price make more and more researchers

exploit their research to more capturing devices for getting

more motion information. Omar Oreifej [3] introduced the

depth sequences for activity recognition. A.Jalal [4] applied

their proposed feature descriptors for life logging at smart

home. Ross B. Girshick [5] proposed general pose estimate

framework from depth data. The superior performance on

3D data demonstrates a potential solution to human activity

recognition. However, with the rapid development of big

data technology, ﬁne motion description, such as hands

and gesture, in the massive network data present huge

challenges for deep data mining and research.

In this paper, we focus on human ﬁne motion description.

Through the extraction of consistently invariant feature, the

framework of 3D hand activity recognition is established.

First, a novel method for hand segmentation and tracking is

introduced to our framework. An effective dynamic model

based on graph cuts can be used for hand state prediction.

Then, inspired by the fusion technology for RGB and

depth information, we consider combining the RGB and

depth videos for ﬁne motion analysis, e.g. hand activity and

gesture recognition. The novel feature representation, named

as 3D Mesh MoSIFT, is developed from the original 3D

MoSIFT feature descriptor [10] for key points detection and

activity description. For learning a discriminative model,

all feature descriptors are clustered to generate a visual

codebook by k-means. A sparser coding method called

simulation orthogonal matching pursuit (SOMP) is used for

representing the linear combination of codewords. Finally,

the new input sample can be recognized by k-nearest

neighbor (KNN) classiﬁer. Experimental results based on

ChaLearn gesture dataset and our hand RGB-D activity

dataset show that our proposed framework for hand activity

recognition can provide better accuracy than other classical

algorithms.

The paper is organized as follows. First, we discuss the

hand segmentation and tracking in Section 2. Then, we

introduce the 3D Mesh MoSIFT feature descriptor in Section

3. In the following, the hand activity recognition framework

based on SOMP is proposed in Section 4. Experimental

analysis is described in Section 5. Section 6 concludes the

paper.

II. H

AND SEGMENTATION AND TRACKING

Kinect camera is used to simultaneously collect the RGB

and depth videos with the different kinds of human hands

activities. The ﬁrst step is hand segmentation based on

the RGB videos. The simple segmentation of objects and

background can be solved by minimizing the following

energy with respect to labeling functions λ,

ε(λ)=ε

(λ)+ε

(λ) (1)

where data term ε

evaluates the likelihood P

(i) of a pixel

i to belong to an object n.

(λ)=−



i∈I



n=0

ln(p

(i))δ(λ, n) (2)

SocialCom/PASSAT/BigData/EconCom/BioMedCom 2013

DOI 10.1109/SocialCom.2013.151

959

下载后可阅读完整内容，剩余3页未读，立即下载

Marxulia

粉丝: 0
资源: 10

3D人体动作识别：多速度STIPs在手部与手势识别中的应用

2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning

Human action recognition using labeled Latent Dirichlet Allocation model

Human Action Recognition Using Key Poses and Atomic Motions

Human Activity Recognition using Wearable Sensors Review,

GettingAndCleaningData:Repository for the Human Activity Recognition using Smartphones Course Project

A Hybrid Method for Human Interaction Recognition using Spatio-Temporal Interest Points

tidydata:基于 UCI Human Activity Recognition Using Smartphones Data Set 生成整洁数据集的代码

getting-and-cleaning-data_course-project:课程项目-Human Activity Recognition Using Smartphones Dataset analysis

Transform based spatio-temporal descriptors for human action recognition

Human action recognition based on latent-dynamic Conditional Random Field

最新资源