基于L-LDA的有监督人体动作识别方法

16 浏览量更新于2024-08-28 收藏 403KB PDF 举报

本文主要探讨了"人类动作识别利用有标签隐狄利克雷分配模型（Labeled Latent Dirichlet Allocation, L-LDA）"这一创新方法。在计算机视觉领域，人类动作识别已经成为一个活跃的研究方向，其应用广泛，包括智能监控、运动分析和虚拟现实等领域。作者Jiahui Yang、Changhong CHEN、Zongliang GAN和Xiuchang ZHU来自江苏省图像处理与通信实验室，南京邮电大学，他们提出了一个将人类动作视为从输入视频序列中提取出的时空词袋的新识别策略。传统的动作识别技术往往依赖于深度学习或者传统的机器学习算法，但L-LDA作为一种基于LDA（隐狄利克雷分配）的监督扩展模型，为动作识别带来了新的视角。LDA原本是一种无监督学习方法，用于发现文本数据中的主题分布，而L-LDA在此基础上增加了标签层，使得训练视频序列能够自动被分类到相应的动作类别中。这种有标签的特性使得L-LDA在处理动作识别任务时，不仅能够捕捉视频序列中的关键时空特征，还能有效地利用已有的标注信息，提升模型的分类准确性。 L-LDA的工作流程包括以下几个关键步骤： 1. **视频预处理**：从视频中提取关键帧并进行特征提取，如光流、HOG（Histogram of Oriented Gradients）或深度特征。 2. **时空词袋表示**：将每个视频片段转化为由时空特征构成的词袋模型，这有助于捕捉动作的动态变化。 3. **L-LDA建模**：使用L-LDA对视频数据进行训练，将每个视频序列分配到预先定义的动作类别中，同时学习动作主题的分布和类别之间的联系。 4. **分类与识别**：在新视频到来时，通过L-LDA模型预测其属于哪个动作类别，基于模型的判别能力。 L-LDA的优势在于它能够结合监督信息，使得模型在无监督学习的基础上具有更好的泛化能力和区分度，尤其是在类别之间的边界清晰的情况下。然而，该方法可能对标注数据的质量和数量有所依赖，如果训练数据不足或者标注不准确，可能会影响最终的识别效果。因此，未来的研究可以探索如何进一步优化L-LDA模型，以适应更多的应用场景和挑战。这篇文章提供了一种新颖且实用的方法，对于计算机视觉领域的动作识别技术的发展具有积极的推动作用。

Human Action Recognition Using

Labeled Latent Dirichlet Allocation Model

Jiahui YANG Changhong CHEN

Zongliang GAN Xiuchang ZHU

Jiangsu Province’s Key Lab of Image Procession and Image Communications

Nanjing University of Posts and Telecommunications,

Nanjing210003, China

Corresponding author: chenchh@njupt.edu.cn

Abstract—Recognition of human actions has already been an

active area in the computer vision domain and techniques related

to action recognition have been applied in plenty of fields such as

smart surveillance, motion analysis and virtual reality. In this

paper, we propose a new action recognition method which

represents human actions as a bag of spatio-temporal words

extracted from input video sequences and uses L-LDA (labeled

Latent Dirichlet Allocation) model as a classifier. L-LDA is a

supervised model extended from LDA which is unsupervised.

The L-LDA adds a label layer on the basis of LDA to label the

category of the train video sequences, so L-LDA can assign the

latent topic variable in the model to the specific action

categorization automatically. What’s more, due to above

characteristic of L-LDA, it can help to estimate the model

parameters more reasonably, accurately and fast. We test our

method on the KTH and Weizmann human action dataset and

the experimental results show that L-LDA is better than its

unsupervised counterpart LDA as well as SVMs (support vector

machines).

Keywords—action recognition; interest points detection; topic

model; labeled Latent Dirichlet Allocation model

I. INTRODUCTION

Action recognition is to represent and track human actions

using computer techniques, and then infer and category actions

combined with other information such as background and

surrounding environment [1]. The key techniques in the field of

action recognition include extracting representative visual

features from video sequences, choosing appropriate feature

descriptor and designing classification model with a good

performance [2]. According to the above analysis, action

recognition can be divided into two level tasks: (1) feature

extraction and representation at the bottom; (2) model learning

and action categorization at the top. We can see the flowchart

of general action recognition approach in Fig.1.

At present, the low-level features of the first level task

mainly include contour, flow, motion trajectory information,

spatio-temporal interest point features and so on. Methods

using contour feature are simple and easy to implement, but

many of them depend on the boundary information of the

contour which is easily affected by the change of background

[3]. Flow feature can perform the detection and tracking of the

actor without any prior knowledge, but it can be affected by

video noise and the change of illumination intensity and the

computation is complex and costly [4]. Motion trajectory

information can be used to analyze the detail of human motion,

but estimation of the position of human key joints and tracking

them in subsequent frames is still hard to perform perfectly [5].

Recently action recognition methods based on spatio-temporal

interest point feature have been widely used and they have

many advantages. For examples, the actors can be accurately

located and these points can record the main information of the

action without tracking the actor. So far, there have been many

interest point detectors. Harris corner detector originally used

in the image processing domain was extended to the space-time

domain, but limited interest points can be detected because the

response function is not sensitive to the change in the temporal

dimension [6]. To solve the problem, a three-dimensional

linear filter detector was proposed which is combined with 2D

Gaussian filters along the spatial dimensions and a pair of 1D

Gabor filters in the temporal dimension and it can detect

sufficient interest points [7]. The idea of the Hessian matrix

was used to detect spatio-temporal interest points upon

invariant scale detection method, which obtains dense interest

points [8].

For the modeling and categorization at the top, there has

been a growing attention in using latent topic models such as

PLSA (the probabilistic latent semantic analysis) [9] and LDA

(Latent Dirichlet Allocation) [10] as classification model. The

topic models were firstly introduced and applied in the domain

of information retrieval and text analysis and so on. When the

models are used to represent the video sequences, more

emphasis is placed on the coherence of content, rather than

mere spatial neighbor relations. Generally, the latent topic

models applied in action recognition are unsupervised, in other

words, they don’t label the category of the train samples, but

only need to put in the number of categories to automatically

learn the probability distributions of the visual words and the

latent topics. Savarese et al. [11] extract local spatio-temporal

interest points as low-level features and apply PLSA to learn

and generate semantic description of each action. In [12], LDA

was used to model human activities in the real scene. Although

unsupervised topic models applied on action recognition have

made much progress, they still have some weakness. For

examples, due to the unsupervised characteristic of the models,

only by employing the ground truth labels, each discovered

cluster can only be named with the most popular action class

label within the cluster. That is to say, we can’t automatically

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38552536

粉丝: 6
资源: 918

基于L-LDA的有监督人体动作识别方法

Human Action Recognition Using Multi Velocity STIPs

Human action recognition based on latent-dynamic Conditional Random Field

A Hierarchical Model Based on Latent Dirichlet Allocation for Action Recognition

Human Action Recognition Using Key Poses and Atomic Motions

GettingAndCleaningData:Repository for the Human Activity Recognition using Smartphones Course Project

tidydata:基于 UCI Human Activity Recognition Using Smartphones Data Set 生成整洁数据集的代码

getting-and-cleaning-data_course-project:课程项目-Human Activity Recognition Using Smartphones Dataset analysis

An End-to-End Spatio-Temporal Attention Model for Human ActionRecognition from Skeleton Data

AAMAZ Human Action Recognition Dataset AAMAZ人体动作识别数据集-数据集

Speech recognition using stochastic phonemic segment model based on phoneme segmentation

最新资源