利用LSTM建模交互上下文识别集体活动

200 浏览量更新于2024-08-31 收藏 492KB PDF 举报

"这篇研究论文探讨了在集体活动识别中如何通过循环模型来捕捉交互上下文，特别是高阶交互作用。作者Minsi Wang、Bingbing Ni和Xiaokang Yang来自上海交通大学，他们提出了一种基于LSTM网络的递归交互上下文建模方案，以解决传统活动识别方法在处理高阶上下文建模时的局限性。" 在集体活动识别中，理解个体间的高阶交互作用，如群体互动，是至关重要的。然而，现有的大多数活动识别方法缺乏对这种复杂情境的有效处理机制。为了解决这一基础性难题，该论文提出了一个基于LSTM（长短期记忆网络）的递归交互上下文建模框架。LSTM因其信息传播和聚合能力而被广泛用于序列数据处理，非常适合处理具有时间依赖性的复杂任务。在提出的模型中，LSTM被用来统一处理单个人的行为动态、内部群体（如群体内成员之间的互动）以及外部群体（如不同群体之间的互动）的交互特征建模。通过这种方式，模型能够生成更具有区分性和描述性的交互特征，这有助于提高集体活动识别的准确性。此外，该模型的一大优势在于其灵活性。它能适应输入实例数量的变化，例如群体中的不同人数，这意味着无论群体规模如何变化，模型都能够有效地捕获和理解交互上下文。这种灵活性对于现实世界的应用，如监控视频分析或社交场合的人群行为理解，具有重要意义。 "Recurrent Modeling of Interaction Context for Collective Activity Recognition" 这篇论文为理解和识别复杂集体活动提供了一个创新的解决方案，它利用LSTM的强大学习能力，深入挖掘高阶交互作用，从而提升了集体活动识别的效率和准确性。这种技术的进步有可能推动智能监控、人机交互以及社会计算等领域的发展。

Recurrent Modeling of Interaction Context for Collective Activity Recognition

Minsi Wang, Bingbing Ni, Xiaokang Yang

Shanghai Jiao Tong University

mswang1994@gmail.com, {nibingbing,xkyang}@sjtu.edu.cn

Abstract

Modeling of high order interactional context, e.g., group

interaction, lies in the central of collective/group activity

recognition. However, most of the previous activity recog-

nition methods do not offer a ﬂexible and scalable scheme

to handle the high order context modeling problem. To

explicitly address this fundamental bottleneck, we propose

a recurrent interactional context modeling scheme based

on LSTM network. By utilizing the information propaga-

tion/aggregation capability of LSTM, the proposed scheme

uniﬁes the interactional feature modeling process for single

person dynamics, intra-group (e.g., persons within a group)

and inter-group (e.g., group to group) interactions. The pro-

posed high order context modeling scheme produces more

discriminative/descriptive interactional features. It is very

ﬂexible to handle a varying number of input instances (e.g.,

different number of persons in a group or different number

of groups) and linearly scalable to high order context mod-

eling problem. Extensive experiments on two benchmark

collective/group activity datasets demonstrate the effective-

ness of the proposed method.

1. Introduction

Analysis of collective activity groups provides useful in-

formation for several real-world applications including so-

cial role understanding and social event prediction. The

main challenge of collective activity recognition is mod-

eling of interactional context information among persons.

This is because that the number of persons involved in an

interaction is always varying. Moreover, in most cases a

collective activity is associated with several sub groups of

interactions, and how to model the group to group interac-

tion is even more challenging.

Previous methods for activity recognition mainly focus-

es on modeling unary features, e.g., single person appear-

ance or dynamics information [21, 26] and person to person

interaction (e.g., pairwise features) [22]. However, these

contextual information modeling schemes are not sufﬁcien-

t for collective activity recognition. It is because that in

LSTM LSTM LSTM

Person Level

Context

Group Level

Context

Scene Level

Context

Input Image Sequence

t-1

t-2

Figure 1. The overview of proposed framework. A hierarchical

recurrent interactional context modeling framework is proposed to

model intra-group and inter-group interaction context.

collective activity, different activity categories might share

the same type of unary or pairwise features (e.g., “stand-

ing alone” in the cases of queueing or discussion, “facing

to same direction” in the case of walking and crossing). In

other words, besides modeling the intra-group interaction

(e.g., interaction among the persons within a group), how

to effectively describe the group to group interaction is of

more importance. Low order contextual features do not pro-

vide sufﬁcient cues to recognize these activities. To address

this fundamental problem, most previous methods attemp-

t to encode the high order relationship among persons in

the scene by inferring the latent graphical structures [9, 8].

However, applying these approaches for collective activi-

ty recognition is infeasible because these methods often re-

quires high computational cost in the case of tree-structured

model during inference and learning. Moreover, it is very

difﬁcult to generalize methods based on graphical models to

handle higher order interactional context. Ni et al.[24] pro-

posed a causality analysis framework to encode unary, pair-

wise and group interaction features. However, this method

is only capable of modeling human trajectory level informa-

tion, which is insufﬁcient to recognize ﬁner-grained actions,

e.g., those can only be recognized by human appearance or

local body part dynamics.

A fundamental problem becomes: how to systematically

encode the high order human interactional context, i.e, the

2017 IEEE Conference on Computer Vision and Pattern Recognition

DOI 10.1109/CVPR.2017.783

7408

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38696176

粉丝: 6
资源: 919

利用LSTM建模交互上下文识别集体活动

dynamic modeling of wind farm interaction

Recurrent attention model for pedestrian attribute recognition.pdf

Long-term Recurrent Convolutional Networks for Visual Recognition and Description (1411.4389v3)-计算机科学

Latent Cross Making Use of Context in Recurrent.pdf

Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image

Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling rar

Spanning trees and recurrent configurations of a graph

2018-Latent Cross-Making Use of Context in Recurrent Recommender Systems.pdf

speech recognition with recurrent network

最新资源