人体动作识别：定向梯度金字塔直方图与协作多任务学习方法

95 浏览量更新于2024-07-14 收藏 714KB PDF 举报

"该资源是一篇发表在KSIITRANSACTIONSONINTERNETANDINFORMATIONSYSTEMS期刊2014年2月刊的科研论文，由Zan Gao、Hua Zhang、An-An Liu等人撰写，主要探讨了利用定向梯度的金字塔直方图和协作式多任务学习进行人体动作识别的方法。" 本文提出了一种结合金字塔导向梯度直方图（Pyramid Histograms of Oriented Gradients, PHOG）和协同多任务学习（Collaborative Multi-task Learning）的人体动作识别技术。首先，研究者们采集全局活动，并分别针对RGB和深度通道构建运动历史图像（Motion History Image, MHI），以此来编码动作的动态特性。MHI是一种有效的时间序列表示，它能够捕获动作的时空信息，对于动作识别至关重要。接着，利用PHOG特征，他们对这些图像进行分析。定向梯度直方图是计算机视觉领域中一种常见的特征提取方法，尤其在物体检测和人脸识别中广泛应用。在金字塔结构中，PHOG能够捕捉不同尺度下的局部纹理和形状信息，这对于识别复杂和多变的动作场景非常有帮助。然后，文章引入了协同多任务学习的概念。在多任务学习中，不同的任务之间可以共享部分知识，提高模型的整体性能。在人体动作识别中，这种协同效应使得模型能够同时学习和理解不同动作间的关联性，从而提高识别的准确性和鲁棒性。论文进一步详细描述了模型的训练过程和优化策略，可能包括了正则化技术、损失函数的选择以及学习率的调整等。作者们可能通过实验对比了他们的方法与其他传统或现代动作识别方法的性能，展示了其优越性。此外，文中可能还涵盖了对不同动作类别和复杂场景的测试，分析了方法在各种条件下的表现，如光照变化、遮挡情况和动作的多样性。最后，论文可能总结了研究结果，并对未来可能的研究方向进行了展望，例如深度学习与PHOG特征的结合，或者更复杂的多模态融合策略等。这篇研究工作为人体动作识别领域提供了新的视角和方法，对于理解和开发更加智能的监控系统、人机交互应用以及增强现实技术有着重要的理论和实践价值。

KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 8, NO. 2, Feb. 2014 487

both RGB and depth channels? Secondly, since RGB and depth images represent one scene in

different modalities, they are complementary to each other and it will benefit human action

recognition by fusing both for discriminative feature representation and model construction.

In fact, in different research domains, the fusion of multi-modalities features or multi-view

features have attracted the attentions of many scientists. For example, in web image search

[18-20], video semantic annotation or tagging [21-24], 3D Object Retrieval [25-28], target

tracking [29] and multi-view object classification [30-34], authors had discussed the

importance of fusion of multi-modalities features or multi-view features, and experiments also

showed its was very helpful for the tasks in different research domains. Thus, we will first

assess the performances when these descriptors in RGB and Depth channels are combined.

Further, with the features from multiple modality resources, we also propose a collaborative

multi-task learning based on transfer learning for human action recognition to assess the

importance of the fusion of multi-modality features.

In addition, for the algorithms evaluation, most above algorithms are just assessed by a

kind of classification model, but it is not adequate. For example, after extracting different kind

of features, all researchers [3-6,17] adopt SVM models to recognize human action;

Approximate string matching [9] and graph model [10] are employed to identify human

motion; In Bobick and Davis [1], similarity matching schemes were employed; What is worse,

most current methods are highly dependent on dataset and therefore the generalization ability

is severely constrained. To solve this problem, some authors have proposed a model-free

method for human action recognition via sparse representation. For example, Authors [35-42]

extracted different kind of features for each action, and then employed sparse representation

based classification algorithm directly without any changing. SRC [41] has been proposed

firstly for face recognition, in which a testing sample is reconstructed and represented by all

the training samples, after that, impulse function is designed for each class and representation,

and then the minimum representation error is adopted to classify the testing sample. Similar to

SRC, the philosophy of the proposed method in [35-40] is to decompose each video sample

containing one kind of human actions as a



sparse linear combination of several video

samples containing multiple kinds of human actions, and it has achieved good performance.

The reason of obtaining success is that the point’s neighborhood structure is utilized fully, and

can supply better similarity measures among the testing data and all the training samples. After

that, Zhang et al. [41] discussed the role of



-norm and



-norm respectively, and then

concluded that the sparsity in SRC was not so important, and collaborative representation

played much more important roles. Thus, what will be happened when these descriptors are

assessed by mode-free models and traditional, constrained classification algorithms depended

on dataset?

3. Motion History Image for RGB and Depth Modalities

In order to represent human motion, human silhouettes of each frame need to be accumulated

and encoded firstly, thus, we construct human motion maps for RGB and depth channel

respectively, and the details will be given as follows.

3.1 MHI for RGB Modality

To describe human motion, motion history images (MHI) [1], where moving human

silhouettes are accumulated and encoded, has been widely employed, and achieved good

performance. However, Bobick and Davis [1], firstly detected or segmented targets in RGB

剩余20页未读，继续阅读

weixin_38625559

粉丝: 2
资源: 948

人体动作识别：定向梯度金字塔直方图与协作多任务学习方法

定向梯度直方图(HOG)描述符-python源码.zip

phogmatlab代码-PHOG:定向梯度的金字塔直方图

FastHumanDetection:使用定向梯度直方图在深度图像中进行人体检测

使用定向梯度直方图进行图像质量评估

定向梯度直方图：mex 函数，用于计算（定向）梯度的直方图（Dalal & Triggs CVPR 2005）。-matlab开发

matlabconv2代码-hog_detector:使用定向梯度直方图进行特征描述和使用SVM进行分类的人脸检测器的MATLAB实现

Traffic_sign_recognition：使用定向梯度直方图（HOG）和基于色域的功能识别交通标志。 支持向量机（SVM）用于对图像进行分类

HOG（定向梯度直方图）MEX 实现：梯度矩形直方图 (R-HoG) 描述符 MEX 实现。-matlab开发

hog:定向梯度直方图特征描述符

Car-Theft-Detection-Using-Face-Recognition:定向梯度直方图已用于检测和识别人脸

最新资源

Traffic_sign_recognition：使用定向梯度直方图（HOG）和基于色域的功能识别交通标志。支持向量机（SVM）用于对图像进行分类