线性动态系统在动作识别中的新应用

36 浏览量更新于2024-08-26 1 收藏 638KB PDF 举报

"本文提出了一种使用线性动态系统（LDS）进行动作识别的新方法。通过引入LDS，研究人员能够有效地捕捉视频中动态纹理的时间平稳性。LDS被用于建模从视频序列中提取的时空补丁，因为这些补丁更接近线性时不变系统。由于LDS不处于欧几里得空间，故采用了核主角（Kernel Principal Angle, KPA）来衡量不同LDS之间的相似度，并利用多类谱聚类（Multiclass Spectral Clustering）生成特征表示的码本。此外，文章还提出了一个监督式码本剪枝策略，旨在保留区分度高的视觉词汇，减少动作类别内部的噪声。这种方法在UCF体育和故事片等具有挑战性的数据集上展示了优越的性能，证明了其在复杂场景中处理动作识别的有效性。" 在动作识别领域，线性动态系统（LDS）提供了一个新的视角。传统上，许多方法依赖于时空特征，如光流或深度信息，但LDS能更好地捕捉动态过程的连续性和稳定性。LDS模型通过对连续时间序列的线性状态方程进行建模，可以描述系统的演化。在动作识别任务中，视频序列可以看作是一系列随时间变化的动态纹理，LDS则能够捕获这些纹理的动态特性。核主角（KPA）在LDS之间的相似度测量中起着关键作用。由于LDS的非欧几里得性质，直接比较变得困难。KPA引入了核方法，使得非线性空间中的角度可以在高维特征空间中进行计算，从而提供了一种有效比较不同LDS的方法。这种方法使得即使在非线性情况下，也能对LDS进行相似度评估。接着，多类谱聚类用于生成特征表示的码本。谱聚类是利用图论中的谱分解来寻找数据集的最佳分割，它在处理大规模、高维度数据时特别有效。在这里，谱聚类将LDS的表示聚集到不同的簇中，每个簇代表一个视觉词汇，形成码本。码本是特征表示的基础，用于后续的分类步骤。监督式码本剪枝策略是文章的另一个创新点。传统的码本生成可能包含噪声或不区分性的视觉词汇，这会影响分类效果。通过监控类间和类内的距离，该策略选择那些最大化类间差异同时最小化类内差异的视觉词汇，以增强分类器的性能和鲁棒性。实验结果在UCF体育和故事片数据集上验证了该方法的先进性。这些数据集因其复杂的背景、动作的变化性和多样性而具有挑战性。所提出的LDS方法在这些数据集上的表现优于其他方法，表明其在真实世界复杂场景中的应用潜力。这项工作为动作识别提供了一个基于线性动态系统的新框架，结合了核主角和谱聚类等技术，实现了更精确的动作识别。这种方法不仅可以应用于体育和电影场景，还可以推广到其他领域，如监控视频分析和人机交互，对动作识别技术的进步有着积极的推动作用。

Action recognition using linear dynamic systems

Haoran Wang

a,b

, Chunfeng Yuan

, Guan Luo

, Weiming Hu

, Changyin Sun

School of Automation, Southeast University, Nanjing, China

National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China

article info

Article history:

Received 20 April 2012

Received in revised form

26 November 2012

Accepted 1 December 2012

Available online 12 December 2012

Keywords:

Linear dynamic system

Kernel principal angle

Multiclass spectral clustering

Supervised codebook pruning

Action recognition

abstract

In this paper, we propose a novel approach based on Linear Dynamic Systems (LDSs) for action

recognition. Our main contributions are two-fold. First, we introduce LDSs to action recognition. LDSs

describe the dynamic texture which exhibits certain stationarity properties in time. They are adopted to

model the spatiotemporal patches which are extracted from the video sequence, because the

spatiotemporal patch is more analogous to a linear time invariant system than the video sequence.

Notably, LDSs do not live in the Euclidean space. So we adopt the kernel princip al angle to measure the

similarity between LDSs, and then the multiclass spectral clustering is used to generate the codebook

for the bag of features representation. Second, we propose a supervised codebook pruning method to

preserve the discriminative visual words and suppress the noise in each action class. The vis ual words

which maximize the inter-class distance and minimize the intra-class distance are selected for

classiﬁcation. Our approach yields the state-of-the-art performance on three benchmark datasets.

Especially, the experiments on the challenging UCF Sports and Feature Films datasets demonstrate the

effectiveness of the proposed approach in realistic complex scenarios.

1. Introduction

Automatic recognition of human actions in videos is useful for

surveillance, content-based summarization, and human–computer

interaction applications. Yet, it is still a challenging problem. In

recent years, a large number of researchers have addressed this

problem as evidenced by several survey papers [1–4].

Action representation is important for action recognition.

There are appearance-based representation [5,40], shape-based

representation [6,41], optical-ﬂow-based representation [7,42],

volume-based representation [8,43] and interest-point-based

representation [9,44]. Among them, methods using local interest

point features together with the bag of visual words model are

greatly popular, due to their simple implementation and good

performance. The bag of visual words approaches are robust to

noise, occlusion and geometric variation, without requirement for

reliable tracking on a particular subject. Despite recent develop-

ments, the representation of local regions in videos is still an open

ﬁeld of research.

Dynamic textures are sequences of images of moving scenes that

exhibit certain stationarity properties in time, such as sea-waves,

smoke, foliage, whirlwind etc. They capture the dynamic informa-

tion in the motion of objects. Doretto et al. [10] show that dynamic

textures can be modeled using a LDS. Tools from system identiﬁca-

tion are borrowed to capture the essence of dynamic textures. Once

learned, the LDS model has predictive power and can be used for

extrapolating dynamic textures with negligible computational cost.

In tradition, LDS is used to describe dynamic textures of video

sequence [11,12]. But a video sequence is usually not a linear time

invariant system due in part to its long time span and complex

changes. Compared with video sequence, the spatiotemporal patch

is analogous to a linear time invariant system. Moreover, LDS

exhibits mo re dynamic information, which is important for the

representation of moving scenes, than traditional local features.

Several categorization algorithms have been proposed based

on the LDS parameters, which live in a non-Euclidean space.

Among these methods, Vishwanathan et al. [13] use Binet–

Cauchy kernels to compare the parameters of two LDSs. Chan

and Vasconcelos [14] use both the KL divergence and the Martin

distance [12,15] as a metric between dynamic systems. Woolfe

and Fitzgibbon [16] use the family of Chernoff distances, and the

distances between cepstrum coefﬁcients are adopted as the

metrics between LDSs. These methods usually deﬁne a distance

measurement between the model parameters of two dynamic

systems. Once such a metric has been deﬁned, classiﬁers such as

nearest neighbors or support vector machines can be used to

categorize a query video sequence based on the training data.

However, all the above approaches are supervised classiﬁcation.

They are not suitable for the codebook generation in the bag

of words representation.

Contents lists available at SciVerse ScienceDirect

journal homepage: www.elsevier.com/locate/pr

Pattern Recognition

http://dx.doi.org/10.1016/j.patcog.2012.12.001

Corresponding author. Tel.: þ86 13910900826.

E-mail address: wmhu@nlpr.ia.ac.cn (W. Hu).

Pattern Recognition 46 (2013) 1710–1718

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38655987

粉丝: 8
资源: 933

线性动态系统在动作识别中的新应用

基于机器学习SVM KNN的动作识别系统 毕业设计 附完整代码

matlab精度检验代码-STCP-DMS:论文的源代码：“时空长方体金字塔，用于使用深度运动序列进行动作识别”

毕业设计MATLAB_使用KTH数据集进行人体动作识别.zip

头部动作识别系统的硬件设计

网络游戏-基于卷积神经网络和数据驱动的非线性动态系统辨识方法.zip

机器人头部动作识别系统的硬件设计

基于SVM分类器的动作识别系统1

基于集成学习分类器的动作识别系统.zip

使用深度神经网络学习深度轨迹描述符以进行视频中的动作识别

毕业设计-基于SVM分类器的动作识别系统.pdf

最新资源

基于机器学习SVM KNN的动作识别系统毕业设计附完整代码