基于层次多任务学习与关系注意力的行人属性识别

下载需积分: 19 | PDF格式 | 1.83MB | 更新于2024-09-04 | 92 浏览量 | 举报

本文主要探讨了"MTMS+Att PAR + via Hierarchical Multi-task Learning and Relationship Attention"这一主题，聚焦于行人属性识别（Pedestrian Attribute Recognition, PAR）在视频监控分析中的关键作用。作者提出了一种新颖的端到端深层学习方法来提升PAR任务的性能。这种方法将语义分割融入到行人属性识别中，并将其视为一个多任务学习问题，从而引入了像素级别的监督。首先，通过构建一个层次化的多任务网络架构，该网络将PAR任务与图像分割任务相结合。这样做的好处在于，通过共享底层特征提取层，两个任务能够相互促进，提高模型的泛化能力和对复杂场景的理解。层次结构使得网络能够处理不同尺度和抽象级别的信息，有助于更准确地捕捉行人特征及其相关属性。其次，关系注意力机制（Relationship Attention）被引入到网络中，这一创新性设计允许模型关注行人之间的关系，如姿势、服装和配件等特征。关系注意力机制有助于捕捉到行人之间潜在的联系，例如，一个人的帽子可能与他们的上衣颜色或发型有关，这种上下文信息对于准确识别属性至关重要。此外，多任务学习的优势在于它能够利用大量的标注数据，因为每个任务都提供了一定程度的训练信号。这有助于减少过拟合，提高模型的鲁棒性，使其能够在不同的行人属性类别上达到更好的性能。总结来说，这项研究通过结合层次化的多任务学习和关系注意力机制，显著提升了行人属性识别的精度和效率。这种方法不仅在个体行人属性识别上表现出色，还能够捕捉到行人之间的复杂关系，为视频监控中的智能分析提供了强有力的支持。研究人员来自北京航空航天大学计算机科学与工程学院的北京大数据和脑计算高级创新中心，共同作者包括Lian Gao、Di Huang、Yuanfang Guo 和 Yunhong Wang，他们分别在论文中展示了他们在该领域的专业知识和贡献。

Pedestrian Aribute Recognition via Hierarchical Multi-task

Learning and Relationship Aention

Lian Gao

Beijing Advanced Innovation Center for Big Data and

Brain Computing, School of Computer Science and

Engineering, Beihang University.

Beijing, China

gaolian@buaa.edu.cn

Di Huang

∗

Beijing Advanced Innovation Center for Big Data and

Brain Computing, School of Computer Science and

Engineering, Beihang University.

Beijing, China

dhuang@buaa.edu.cn

Yuanfang Guo

School of Computer Science and Engineering, Beihang

University, Beijing, China.

Beijing, China

andyguo@buaa.edu.cn

Yunhong Wang

Beijing Advanced Innovation Center for Big Data and

Brain Computing, School of Computer Science and

Engineering, Beihang University.

Beijing, China

yhwang@buaa.edu.cn

ABSTRACT

Pedestrian Attribute Recognition (PAR) is an important task in

surveillance video analysis. In this paper, we propose a novel end-

to-end hierarchical deep learning approach to PAR. The proposed

network introduces semantic segmentation into PAR and formu-

lates it as a multi-task learning problem, which brings in pixel-level

supervision in feature learning for attribute localization. According

to the spatial properties of local and global attributes, we present a

two stage learning mechanism to decouple coarse attribute local-

ization and ne attribute recognition into successive phases within

a single model, which strengthens feature learning. Besides, we de-

sign an attribute relationship attention module to eciently capture

and emphasize the latent relations among dierent attributes, fur-

ther enhancing the discriminative power of the feature. Extensive

experiments are conducted and very competitive results are reached

on the RAP and PETA databases, indicating the eectiveness and

superiority of the proposed approach.

CCS CONCEPTS

• Computing methodologies → Object recognition.

KEYWORDS

pedestrian attribute recognition, deep learning, multi-task learning

and visual attention

∗

indicates the corresponding author.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

MM ’19, October 21–25, 2019, Nice, France

ACM ISBN 978-1-4503-6889-6/19/10.. . $15.00

https://doi.org/10.1145/3343031.3351003

ACM Reference Format:

Lian Gao, Di Huang, Yuanfang Guo, and Yunhong Wang. 2019. Pedestrian

Attribute Recognition via Hierarchical Multi-task Learning and Relation-

ship Attention. In Proceedings of the 27th ACM International Conference on

Multimedia (MM ’19), October 21–25, 2019, Nice, France. ACM, New York,

NY, USA, 9 pages. https://doi.org/10.1145/3343031.3351003

1 INTRODUCTION

Nowadays, video surveillance systems have been widely employed

with dierent security demands in various public and private facili-

ties and places, including squares, malls, railway stations, airports,

residential buildings, libraries, etc. Pedestrians are major targets in

surveillance videos and automatic pedestrian analysis is important

to many applications, such as key person indexing, criminal trajec-

tory tracking, and abnormal behavior detection, where Pedestrian

Attribute Recognition (PAR) plays a fundamental role. PAR aims

to predict intrinsic characteristics (e.g. “gender", “age") as well as

appearance properties (e.g. “clothes style", “accessory") of persons

and has received increasing attentions in recent years.

PAR is a challenging task with a number of intractable problems.

On the one hand, it has to handle the common reputed issues in the

eld of computer vision, involving changes in ambient illumination,

camera viewpoint, video resolution, person gesture, and external

occlusion. On the other hand, to satisfy diverse requirements, the

number of attributes concerned becomes larger and larger. The

attributes convey rich semantic information at dierent levels. In

general, local attributes (e.g. “hair style" and “accessory") are related

to low-level or mid-level appearance features of certain regions,

while global attributes (e.g. “gender") require holistic representation

with special areas highlighted (e.g. face, hair, and torso), probably

corresponding to some local attributes. This complexity of attribute

relationship makes PAR even more dicult. Figure 1 shows some

examples of pedestrians and typical attributes.

Early studies on PAR follow the detection pipeline, which rstly

extracts handcrafted features of candidate regions and then feeds

them into classiers for prediction, and demonstrate promising re-

sults [

]. Unfortunately, they can only handle single or very few

similar attributes, as the features used are ad-hoc and not easy to be

Session 3B: Attention & Saliency

MM ’19, October 21–25, 2019, Nice, France

1340

下载后可阅读完整内容，剩余8页未读，立即下载

佑林杉

粉丝: 10
资源: 28

基于层次多任务学习与关系注意力的行人属性识别

迅思科量具计量管理软件 MTMS v2.0.rar

mtms-console:适用于mTMS平台的控制台ui

036GraphTheory(图论) matlab代码.rar

026SVM用于分类时的参数优化，粒子群优化算法，用于优化核函数的c,g两个参数(SVM PSO)Matlab代码.rar

药店管理-JAVA-基于springBoot的药店管理系统的设计与实现（毕业论文+开题）

【网络】基于matlab高动态网络拓扑中OSPF网络计算【含Matlab源码 10964期】.zip

今天吴老师上课的时候说我.txt

检测骨架图像的交点Matlab代码.rar

MMC simulink 模块化多电平变流器 载波移相 双闭环仿真 输出谐波分析，线性自抗扰控制LADRC 有仿真文件

自动驾驶控制-斯坦利（stanely）算法路径跟踪仿真 matlab和carsim联合仿真搭建的无人驾驶斯坦利控制器仿真验证，可以实现双移线，圆形，以及其他自定义的路径跟踪 跟踪效果如图，几乎没有误

最新资源

MMC simulink 模块化多电平变流器载波移相双闭环仿真输出谐波分析，线性自抗扰控制LADRC 有仿真文件

自动驾驶控制-斯坦利（stanely）算法路径跟踪仿真 matlab和carsim联合仿真搭建的无人驾驶斯坦利控制器仿真验证，可以实现双移线，圆形，以及其他自定义的路径跟踪跟踪效果如图，几乎没有误