行人属性识别的循环注意力模型

需积分: 14 156 浏览量更新于2024-09-04 收藏 940KB PDF 举报

“Recurrent Attention Model for Pedestrian Attribute Recognition” 行人属性识别是计算机视觉领域中的一个挑战性任务，目标是从监控图像中预测行人的各种属性标签，如性别、年龄、穿着等。由于图像质量差和训练数据集小，这项任务极具难度。在该研究中，作者观察到许多需要识别的语义行人属性往往具有空间局部性和语义关联性，即它们可以被归组在一起。然而，以前的工作大多忽略了这一现象。为了解决这个问题，研究者们受到了循环神经网络（Recurrent Neural Network, RNN）的启发，提出了一种递归注意力模型（Recurrent Attention Model）。RNN是一种能够处理序列数据的深度学习模型，因其在处理时间序列上的内在记忆能力而闻名。在这个模型中，注意力机制被引入，以允许模型在不同的时间步长（time steps）聚焦于图像的不同区域，从而捕捉到与特定属性相关的空间信息。递归注意力模型的核心思想是通过迭代地关注图像的不同部分来逐步解析行人属性。在每个时间步，模型会生成一个注意力分布，这个分布指导模型集中于图像的特定区域，以提取与当前预测属性相关的特征。然后，这些特征与上一时间步的隐藏状态相结合，更新模型的状态，以便在下一个时间步继续对其他属性进行预测。这样，模型能够在处理序列信息的同时，利用已有的上下文信息来提高预测精度。此外，由于RNN可能存在梯度消失或梯度爆炸的问题，长短期记忆网络（Long Short-Term Memory, LSTM）单元通常被用于改进模型的记忆能力。LSTM能够有效地存储和控制长期依赖，这对于处理行人属性这种复杂序列信息至关重要。实验结果表明，递归注意力模型在行人属性识别任务上表现出优于传统方法的性能。它不仅提高了识别准确率，还展示了对复杂场景和低质量图像的强大适应性。这一成果对于提升智能监控系统、自动驾驶汽车的人工智能性能以及人机交互应用有着重要的实践意义。这篇研究论文提出了一个新颖的递归注意力模型，通过结合循环神经网络的序列处理能力和注意力机制的空间关注，有效地解决了行人属性识别的挑战，为未来的计算机视觉研究提供了新的思路。

The Thirty-Third AAAI Conference on Artiﬁcial Intelligence (AAAI-19)

Recurrent Attention Model for Pedestrian Attribute Recognition

∗

Xin Zhao,

Liufang Sang,

Guiguang Ding,

Jungong Han,

Na Di,

Chenggang Yan

Beijing National Research Center for Information Science and Technology(BNRist)

School of Software, Tsinghua University, Beijing 100084, China

School of Computing & Communications, Lancaster University, UK

Institute of Information and Control Hangzhou Dianzi University, Hangzhou, China

zhaoxin19@gmail.com, slf12thuss@163.com, dinggg@tsinghua.edu.cn,

jungong.han@northumbria.ac.uk, dn15@mails.tsinghua.edu.cn, cgyan@hdu.edu.cn

Abstract

Pedestrian attribute recognition is to predict attribute labels

of pedestrian from surveillance images, which is a very chal-

lenging task for computer vision due to poor imaging qual-

ity and small training dataset. It is observed that many se-

mantic pedestrian attributes to be recognised tend to show

spatial locality and semantic correlations by which they can

be grouped while previous works mostly ignore this phe-

nomenon. Inspired by Recurrent Neural Network (RNN)’s

super capability of learning context correlations and Atten-

tion Model’s capability of highlighting the region of inter-

est on feature map, this paper proposes end-to-end Recurrent

Convolutional (RC) and Recurrent Attention (RA) models,

which are complementary to each other. RC model mines the

correlations among different attribute groups with convolu-

tional LSTM unit, while RA model takes advantage of the

intra-group spatial locality and inter-group attention correla-

tion to improve the performance of pedestrian attribute recog-

nition. Our RA method combines the Recurrent Learning and

Attention Model to highlight the spatial position on feature

map and mine the attention correlations among different at-

tribute groups to obtain more precise attention. Extensive em-

pirical evidence shows that our recurrent model frameworks

achieve state-of-the-art results, based on pedestrian attribute

datasets, i.e. standard PETA and RAP datasets.

Introduction

Pedestrian attributes, e.g., age, haircut, and footware, are hu-

manly searchable semantic descriptions and can be used as

the soft-biometrics in visual surveillance applications such

as person re-identiﬁcation (Layne, Hospedales, and Gong

2012; Liu et al. 2012; Peng et al. 2016), face veriﬁcation

(Kumar et al. 2009), and human identiﬁcation (Reid, Nixon,

and Stevenage 2014). Attributes are robust against view-

point changes and viewing condition diversity compared to

low-level visual features. While pedestrian attribute recog-

nition has been proﬁtably tackled from a face recognition

perspective, very few works focus on whole people body.

∗

This research was supported by the National Key R&D Pro-

gram of China (2018YFC0806900) and the National Natural Sci-

ence Foundation of China (No. 61571269). Corresponding author:

Guiguang Ding.

 2019, Association for the Advancement of Artiﬁcial

There is inherently challenging to recognise pedestrian at-

tributes from real-world surveillance images subject to the

poor imaging quality and small training dataset. High imag-

ing quality and large scale training data are not available for

pedestrian attributes. For example, the two largest pedestrian

attribute benchmark datasets PETA (Deng et al. 2014) and

RAP (Li et al. 2016a) contain only 9500 and 33268 train-

ing images. Besides, recognising pedestrian attributes has

to cope with images with poor quality, imbalance label and

complex appearance variations in surveillance scenes.

Attribute recognition methods include hand-crafted fea-

ture methods, CNN methods and CNN-RNN methods. Early

attribute recognition methods mainly rely on hand-crafted

features like colour and texture (Layne, Hospedales, and

Gong 2012; Liu et al. 2012; Jaha and Nixon 2014). Re-

cently, deep learning based attribute models have been pro-

posed due to the capacity to learn more expressive repre-

sentations (Li, Chen, and Huang 2015; Fabbri, Calderara,

and Cucchiara 2017; Liu et al. 2017b), which signiﬁcantly

improve the performance of pedestrian attribute recogni-

tion. For example, DeepMAR method (Li, Chen, and Huang

2015) utilizes the prior knowledge in the object topology for

attribute recognition and designs a weighted sigmoid cross-

entropy loss to deal with the data imbalance problem whilst

training attribute recognition model. Multi-directional at-

tention modules are applied in an inception based deep

model named HydraPlus Network (Liu et al. 2017b) to take

the visual attention into consideration. CNN-RNN meth-

ods are proved to be successful in multi-label classiﬁca-

tion task to mine the dependency of labels (Li et al. 2017;

Liu et al. 2017a). A recurrent encoder-decoder framework is

introduced into pedestrian attribute recognition task (Wang

et al. 2017b), which aims to discover the interdependency

and correlation among attributes with Long Short-Term

Memory (LSTM) model.

Attributes of pedestrian always show semantic or visual

spatial correlation by which they can be grouped. For ex-

ample, BoldHair and LongHair cannot occur on the same

person while they are both related to the head-shoulders re-

gion of a person, so they can be in the same group to be

recognised together with a speciﬁc attention on the head-

shoulders region. Existing methods try to mine the corre-

lations of attributes separately but ignore the spatial neigh-

borhood relationship and the semantic similarity of a group

9275

下载后可阅读完整内容，剩余7页未读，立即下载

佑林杉

粉丝: 10
资源: 28

行人属性识别的循环注意力模型

Pedestrian Attribute Recognition with Graph Convolutional Network in Surveillanc

A neural attention model for speech command recognition

A deep learning model for estimating story points.pdf

LARNN_ Linear Attention Recurrent Neural Network – arXiv Vanity.pdf

【13】Achieving Human Parity in Conversational Speech Recognition.pdf

An End-to-End Spatio-Temporal Attention Model for Human ActionRecognition from Skeleton Data

Improved Recurrent Neural Networks for Session-based Recommendations.pdf

Homayounfar_Hierarchical_Recurrent_Attention_CVPR_2018_paper.pdf

藏经阁-Rethinking Recurrent Neural Ne (1).pdf

Recurrent Event Network for Reasoning over Temporal Knowledge Graphs.pdf

最新资源