基于潜在狄利克雷分配的层次动作识别模型

88 浏览量更新于2024-08-29 收藏 239KB PDF 举报

"这篇研究论文提出了一种基于潜在狄利克雷分配的层次模型(h-LDA)用于动作识别。该模型由外观组和运动组组成，通过在每个组内引入两层主题的层次结构来学习人体动作的空间时间模式（STPs）。基本思想是，两层主题分别用于建模动作的全局STPs和局部STPs。每组从两种互补类型的特征中生成离散词汇。学习到的这两个组中的每个主题都用于描述动作的特定方面，其中中级主题用于通过包含几何结构来描述局部STPs。" 这篇研究论文深入探讨了如何利用层次模型改进动作识别的性能。具体来说，它借鉴了层次表示的近期成功经验，提出了一个基于潜在狄利克雷分配的新型层次模型。潜在狄利克雷分配（LDA）是一种统计建模方法，通常用于主题建模，这里被扩展以适应动作识别任务。模型的核心在于其双重分组结构，包括外观组和运动组。外观组关注动作的视觉外观，而运动组则关注动作的动态变化。每个组内部进一步分为两层主题，一层专注于全局特征，另一层关注局部特征。这种设计允许模型捕获动作的复杂空间时间模式，包括静态和动态信息。在每个组中，通过两种互补特征类型生成离散词汇。这可能涉及使用如颜色、纹理等视觉特征来构建外观词汇，以及使用光流、关节运动等动态特征来构建运动词汇。这些词汇为每个动作提供了多维度的描述，使得模型能够更全面地理解动作的各个方面。中级主题的学习对于描述局部STPs至关重要。这些主题捕捉动作的微小变化和局部模式，如关节的特定运动或特定身体部位的特定排列。通过将中级主题与全局主题结合，模型可以综合分析整个动作序列，从而提高识别准确性。这篇研究论文展示了如何通过层次化和主题建模方法来增强动作识别系统的性能。这种方法不仅有助于理解和解析复杂的动作序列，还可能为未来的人工智能系统提供更强大的视觉理解能力，特别是在视频分析和监控等领域。通过这种创新模型，研究人员旨在推动动作识别技术的边界，使其更加精确和智能化。

A Hierarchical Model Based on Latent Dirichlet

Allocation for Action Recognition

Shuang Yang, Chunfeng Yuan, Weiming Hu

National Laboratory of Pattern Recognition,

Institute of Automation, CAS, China

Email: {syang,cfyuan,wmhu}@nlpr.ia.ac.cn

Xinmiao Ding

Shandong Institute of

Business and Technology

Email: dingxinmiao@126.com

Abstract—Inspired by the recent success of hierarchical rep-

resentation, we propose a new hierarchical variant of latent

Dirichlet allocation (h-LDA) for action recognition. The model

consists of an appearance group and a motion group, and we

introduce a new hierarchical structure including two-layer topics

in each group to learn the spatial temporal patterns (STPs)

of human actions. The basic idea is that the two-layer topics

are used to model the global STPs and the local STPs of the

actions respectively. Two groups of discrete words are generated

from two complementary kinds of features for each group.

Each topic learned in these two groups is used to describe

a particular aspect of the actions. Speciﬁcally, the mid-level

topics are learned to describe the local STPs by including the

geometric structure information in the lower-level words. The

top-level topics are learned from the mid-level topics and are the

mixture distribution of the local STPs, which makes the top-level

topics appropriate to represent the global STPs. In addition, we

give the learning and inference process by Gibbs sampling with

reasonable assumptions. Finally, each sample is discriminatively

represented as the probabilistic distribution over the global STPs

learned by the proposed h-LDA. Experimental results on two

datasets demonstrate the effectiveness of our approach for action

recognition.

I. INTRODUCTION

In recent years, a signiﬁcant amount of effort has been

devoted to automatic recognition of human actions in videos.

However, there still exist many difﬁculties in the appropri-

ate representation of different actions, which makes action

recognition a challenging problem. In this paper, we propose

a new hierarchical model based on latent Dirichlet allocation

(LDA) to learn the spatial temporal patterns (STPs) of action

representation. Combined with the random forest classiﬁer,

experimental results show that our approach is effective for

action recognition.

A. Related Work

Recently, representation by learning from a hierarchical

structure for action recognition has gained a lot of interest.

Song et al. [1] propose a hierarchical sequence summariza-

tion approach by learning multiple layers of discriminative

feature representations at different temporal granularities for

action recognition. Wang et al. [2] construct a hierarchical

representation of local feature descriptors by combining the

local features and their contexts for action recognition. Niebles

and FeiFei [3] propose a hierarchical model to combine the

spatial and spatial-temporal features to represent each frame

as a mixture of constellations for action recognition.

All the methods above show that the representation using

a hierarchical structure is powerful for action recognition,

yet there are still certain weaknesses in these methods, such

as using only one kind of feature [4] or requiring manual

annotation. Furthermore, they are all based on discriminative

models, which are devised for the speciﬁc task and do not

provide a generic characterization.

Among the various generative models, topic models have

been applied widely for many computer vision tasks, such

as scene categorization[5], object recognition [6] and action

recognition [7]. The topic models are proposed at the ﬁrst

time in the text domain to learn the latent semantic topics in

each text documents, such as the probabilistic latent Semantic

indexing (pLSI) [8] and latent Dirichlet allocation (LDA) [9].

In recent years, they are introduced frequently into the ﬁeld

of computer vision. In [5], Fei-Fei et al. build a variant of

LDA which considers an image as a document and an image

patch as a word to discover the intermediate themes for natural

scene categorization. In [7], Wang et al. take the class label

to be the latent topic and a frame in a sequence to be a

word to build the supervised LDA model (s-LDA) for action

recognition. In [10], Wang et al. present spatial LDA by adding

the Gaussian distribution over the words assigned in the same

document to learn the semantic representation of images for

object recognition.

Most methods build the model with only one layer topics.

In spite of their simplicity, these methods work well for the

speciﬁc task. However, the hierarchical structure with only one

layer topics leads to the limited generalizability. Moreover,

most of the previous topic-model based methods build their

model from only one type of observation, which is efﬁcient

but may be not enough for complex actions.

B. Our Approach

To solve the above limitations, we propose a novel hierarchi-

cal variant of LDA, named h-LDA, by combining two groups

of two-layer topics to learn the spatial temporal patterns

(STPs) of human actions. Speciﬁcally, the two-layer topics

are introduced to learn the global STPs and the local STPs of

the actions in the corresonding group respectively. The low-

level words in the two groups are generated individually from

2014 22nd International Conference on Pattern Recognition

DOI 10.1109/ICPR.2014.451

2613

下载后可阅读完整内容，剩余5页未读，立即下载

付出余切

粉丝: 200

基于潜在狄利克雷分配的层次动作识别模型

Human action recognition using labeled Latent Dirichlet Allocation model

概率图模型Probabilistic Graphical Model论文集5

Unsupervised language identification based on Latent Dirichlet Allocation

Multi-Feature Max-Margin Hierarchical Bayesian Model for Action Recognition

Automatically Analyze Text Hierarchical Structure Based on Naïve Bayes Model

HSCS: Hierarchical Sparsity Based Co-saliency Detection for RGBD Images

HMCTS-OP: Hierarchical MCTS based online planning in the asymmetric adversarial environment

A hierarchical ontology context model for work-based learning

deep-activity-rec:论文 ibrahim et al, cvpr 2016 - A Hierarchical Deep Temporal Model for Group Activity Recognition -

Pedestrian detection algorithm in traffic scene based on weakly supervised hierarchical deep model

最新资源