基于过完备独立成分分析的动作识别框架

42 浏览量更新于2024-08-29 收藏 1.68MB PDF 举报

“基于过完备独立成分分析的动作识别” 在当前的计算机视觉领域，动作识别是研究的热点之一，主要用于理解视频中的动态行为。传统的动作识别方法通常分为两个阶段：(1)设计或学习手工艺品特征，这可能包括颜色、纹理、运动等；(2)使用分类器（如支持向量机SVM或Adaboost）对提取的特征进行分类。然而，这些方法往往依赖于复杂的特征工程，并且可能无法充分利用视频数据的内在特性。基于两个关键观察，本文提出了一种新的、简单但有效的动作识别框架——利用过完备独立成分分析(Overcomplete Independent Component Analysis, OICA)。首先，观察到独立成分分析(ICA)能够从视频数据中编码内在的、本质的特征。这些内在特征通常包含有关动作的最重要信息，例如运动模式、物体交互和时空结构。其次，注意到不同动作之间的主要区别往往在于它们的内在特征。因此，OICA模型被用来代替传统的方法，以自动学习视频的这些关键特征。过完备表示是该方法的核心概念。在传统的ICA中，数据被表示为有限个独立成分的线性组合，而在过完备ICA中，这个基础集合是“过完备”的，即基础的数量多于数据的维度。这样的设置允许模型以更灵活的方式捕获数据的复杂性，特别是对于具有高维度和复杂时间序列结构的视频数据。在文章中，作者首先通过训练过完备的ICA基函数来学习视频的内在表示。这一过程涉及最小化重构误差，以确保所学习的基函数能有效地代表原始数据。随后，这些内在特征用于构建一个分类器，可以区分不同的动作类别。由于这些特征是由数据驱动的，而不是人为设计的，因此它们更可能捕捉到与动作识别相关的关键信息。实验部分，作者可能在各种基准数据集上验证了提出的框架，如UCF101、HMDB51等，以评估其性能。通过与其他方法的比较，展示了OICA在动作识别任务上的优势，可能包括更高的准确率和更好的泛化能力。这篇论文提供了一个新颖的视角来处理动作识别问题，利用过完备的独立成分分析进行特征学习，简化了传统方法的复杂性，并提高了识别效果。这种方法的潜在应用范围广泛，可以应用于监控系统、社交媒体分析、人机交互等多个领域。

2.1. Hand-designed features for action recognition

Extracting local features for action recognition usually consists of two steps: detecting features with a detector and rep-

resenting features with a descriptor. Both the detector and descriptor are usually extended from 2D image domain to 3D

video domain, inspired by their success in object recognition. Laptev and Linderberg [25] extends the Harris detector [15]

and proposes the Harris3D detector. Dollar et al. [11] proposes the Cuboid detector, which ﬁrst computes the temporal Gabor

ﬁlters responses and then locates the interest points by ﬁnding the maximal responses in a local range. Inspired by the suc-

cess of the Hessian saliency measure used in blob detection in images, Willems et al. [53] proposes the Hessian detector as a

spatio-temporal extension of the Hessian saliency measure. In addition to these sparse interest point detector, dense sam-

pling can also be considered as a special detector which extracts video patches at regular positions and scales.

After detecting an interest point, a feature descriptor is used to extract local features around the interest point. Dollar

et al. [11] proposes the Cuboid descriptor. To characterize the local motion and appearance features, Laptev et al. [26] com-

putes histograms of spatial gradient and optical ﬂow accumulated in space–time neighborhoods of the detected interest

points. Klaser et al. [24] proposes the HOG3D descriptor, which is based on histograms of 3D gradient orientations and there-

fore can be seen as an extension of the popular SIFT descriptor [33]. Similarly, Willems et al. [53] extends the SURF descriptor

[4] to extract features from a video.

2.2. Learning based features for action recognition

Recently, feature learning methods have been introduced in action recognition. Taylor et al. [46] proposes a novel con-

volutional GRBM method for learning spatio-temporal features, which can be considered as an extension of convolutional

RBMs from 2D images to 3D videos. Because the objective function is intractable and thus sampling is required, their method

has high computational cost and takes 2–3 days to train the model on the Hollywood2 dataset. This disadvantage limits their

applications for large scale problems. Ji et al. [22] extends convolutional neural networks from 2D spatial domain to 3D spa-

tio-temporal domain to learn features for action recognition. This method extracts features from both the spatial and tem-

poral dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent

frames. Similar to Taylor et al. [46], Le et al. [29] proposes a hierarchical invariant spatio-temporal feature learning frame-

work based on independent subspace analysis. Their model consists of two layers. The ﬁrst layer vectorizes the sampled 3D

patches into column vectors and then uses ICA to learn a set of ICA basis functions. The second layer uses the subspace ICA to

encode the responses of the ﬁrst layers. Finally the responses of the ﬁrst and second layers are combined to form the ﬁnal

features vector using the bag-of-word model [27]. The feature vectors are then fed into a SVM classiﬁer to perform

classiﬁcation.

3. Proposed method

The proposed action recognition framework based on overcomplete ICA is illustrated in Fig. 1. At the training stage, after

densely sampling a set of 3D patches from training videos of each class, a set of overcomplete ICA basis functions are learned.

Fig. 1. A conceptual diagram of the proposed action recognition framework.

S. Zhang et al. / Information Sciences 281 (2014) 635–647

637

剩余12页未读，继续阅读

等你下课⊙▽⊙

粉丝: 291
资源: 962

基于过完备独立成分分析的动作识别框架

"基于单片机的指纹识别系统设计与应用研究：生物识别技术的发展

基于单片机的指纹识别系统设计研究及实现

IbPRIA 2009: 4th Iberian Conference on Pattern Recognition and Image Analysis

Face Recognition and Micro-expression Recognition Based on Discriminant Tensor Subspace Analysis Plus Extreme Learning Machine

Texture Recognition based on fractal analysis

Free Viewpoint Action Recognition based on Self-similarities

Human action recognition based on latent-dynamic Conditional Random Field

论文研究-Multilevel LSTM for Action Recognition Based on Skeleton Sequence.pdf

Event Recognition based on 3D Convolutional Networks

CAPTCHA Recognition Based on Convolutional Neural Network

最新资源