深度行人识别：LSTM与三元组损失驱动的Deep-Person模型

需积分: 11 164 浏览量更新于2024-09-12 收藏 2.2MB PDF 举报

本文探讨了2020年在《Pattern Recognition》杂志上发表的一篇名为“Deep-Person: Learning Discriminative Deep Features for Person Re-Identification”的研究论文。该论文由Xiang Bai等人来自华中科技大学电子与通信学院撰写，着重于解决行人再识别（Person Re-ID）问题，这是一个涉及在复杂场景下（如不准确的人体框检测、背景干扰和遮挡）识别同一人的任务。研究的核心创新在于作者提出的Deep-Person模型，它采用了一种新颖的方法来学习深度特征。首先，他们将人体视为一系列从头到脚的身体部件，运用长短时记忆（Long Short-Term Memory, LSTM）网络来捕捉和理解这些部件之间的空间上下文关系。LSTM是一种递归神经网络，特别适合处理序列数据，这有助于整合不同部位特征之间的动态关联，提高特征的表示能力。其次，Deep-Person模型引入了两个互补的识别分支策略。第一个是全局和局部特征的融合，通过考虑整体和部分特征的结合，提高了对行人身份的识别精度。这不仅关注单个部件的特征，还考虑了它们在整个人体结构中的相互作用。这种方法有助于减少局部信息的孤立性和增强整体特征的区分度。第二个创新是结合了softmax识别分支和 triplet loss 排名分支。softmax通常用于多分类问题，而在 Re-ID 中，它用于计算每个实例与其他类别的相似度。而 triplet loss 是一种用于深度学习的损失函数，旨在使正样本间的距离小于负样本间的距离，从而优化特征空间中同类人物的紧凑性和异类间的分离。这两个分支的集成使得Deep-Person能够在保持精确识别的同时，强化对潜在混淆样本的区分能力。通过这种端到端（end-to-end）的学习框架，Deep-Person能够有效地应对行人再识别中的各种挑战，比如动态变化的姿势、光照条件和环境因素。论文的实验结果显示，这种方法在多个基准数据集上的性能优于传统方法，证明了其在处理行人再识别任务中的有效性。这篇论文通过引入LSTM和双分支结构，为行人再识别提供了一种新的深度学习方法，提升了特征的表达能力和识别性能，对于当前的计算机视觉领域具有重要的理论价值和实际应用潜力。

Pattern Recognition 98 (2020) 107036

Contents lists available at ScienceDirect

Pattern Recognition

journal homepage: www.elsevier.com/locate/patcog

Deep-Person: Learning discriminative deep features for person

Re-Identiﬁcation

Xiang Bai, Mingkun Yang, Tengteng Huang, Zhiyong Dou, Rui Yu, Yongchao Xu

∗

School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), Wuhan 430074, China

a r t i c l e i n f o

Article history:

Received 2 March 2018

Revised 14 July 2019

Accepted 3 September 2019

Available online 6 September 2019

Keywords:

Person Re-ID

LSTM

Triplet loss

End-to-end

a b s t r a c t

Person re-identiﬁcation (Re-ID) requires discriminative features focusing on the full person to cope with

inaccurate person bounding box detection, background clutter, and occlusion. Many recent person Re-ID

methods attempt to learn such features describing full person details via part-based feature representa-

tion. However, the spatial context between these parts is ignored for the independent extractor on each

separate part. In this paper, we propose to apply Long Short-Term Memory (LSTM) in an end-to-end way

to model the pedestrian, seen as a sequence of body parts from head to foot. Integrating the contex-

tual information strengthens the discriminative ability of local feature aligning better to full person. We

also leverage the complementary information between local and global feature. Furthermore, we inte-

grate both identiﬁcation task and ranking task in one network, where a discriminative embedding and a

similarity measurement are learned concurrently. This results in a novel three-branch framework named

Deep-Person, which learns highly discriminative features for person Re-ID. Experimental results demon-

strate that Deep-Person outperforms the state-of-the-art methods by a large margin on three challenging

datasets including Market-1501, CUHK03, and DukeMTMC-reID.

1. Introduction

Person re-identiﬁcation (Re-ID) refers the task of matching a

speciﬁc person across multiple non-overlapping cameras. It has

been receiving increasing attention in the computer vision commu-

nity thanks to its various surveillance applications. Despite decades

of study on person Re-ID task, it is still very challenging due to

inaccurate person bounding box detection and large variations in

illumination, pose, background clutter, occlusion, and ambiguity

in visual appearance. Discriminative features focusing mainly on

full person are inevitable to cope with these challenges in person

Re-ID.

Most early works in person Re-ID either focus on discrimina-

tive hand-craft feature representation or robust distance metric for

similarity measurement. Beneﬁting from the development of deep

learning and increasing large-scale datasets [1–3] , recent person

Re-ID methods combine feature extraction and distance metric into

an end-to-end deep convolution neural network (CNN). Neverthe-

less, most recent CNN-based methods endeavor to either design

a better feature representation or develop a more robust feature

∗

Corresponding author.

E-mail addresses: xbai@hust.edu.cn (X. Bai), yangmingkun@hust.edu.cn (M.

Yang), tengtenghuang@hust.edu.cn (T. Huang), zydou@hust.edu.cn (Z. Dou),

yurui.thu@gmail.com (R. Yu), yongchaoxu@hust.edu.cn (Y. Xu).

learning, but rarely both aspects together. Recently, some semi-

supervised and unsupervised methods are proposed to further pro-

mote this ﬁeld [4,5] , which achieve satisfactory performance with

few or even no labels.

The CNN-based methods focusing on better feature represen-

tations can be roughly divided into three categories: 1) Global

full-body representation, which is adopted in many methods [3,6] .

Global average pooling is widely used for such global feature ex-

traction, which decreases the granularity of features, thus resulting

in missing local details (see Fig. 1 (a)); 2) Local body-part represen-

tation, which has been exploited in many works with variant part

partitions. A straightforward partition into predeﬁned rigid body

parts is used in many works [7–10] . This may make the learned

feature focus on some person details, Yet, due to pose variations,

imperfect pedestrian detectors, and occlusion, such trivial partition

fails to correctly learn features aligned to full person, leading to

part-based features far from robust. Some recent works endeavor

to develop better body partitions with some sophisticated meth-

ods [11–13] or using extra pose annotation [14,15] . Although these

part-based methods can enrich the generated feature describing

better some person details, they all ignore the contextual informa-

tion between the body parts, still failing to well align to full per-

son and suffering from occlusion, blurring, and background noise.

In [16] , the authors propose to ﬁrst convert the original person im-

age into sequential LOMO and Color Names features, then rely on

https://doi.org/10.1016/j.patcog.2019.107036

下载后可阅读完整内容，剩余9页未读，立即下载

姣孙孙孙

粉丝: 1
资源: 4

深度行人识别：LSTM与三元组损失驱动的Deep-Person模型

Deep-Person-master.zip

Learning Deep Features for Discriminative Localization论文原文加翻译

Matlab code for Learning a discriminative high-fidelity dictionary for SCSS

Bag of Tricks and A Strong Baseline for Deep Person Re-identification

Graph-based discriminative features learning for fine-grained image retrieva

learning deep features for discriminative localization

SphereFace: Deep Hypersphere Embedding for Face Recognition

论文Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification是如何对骨干网络进行微调的？有哪些具体实验步骤和方法？体现在伪代码上又是怎么样的？

discriminative learning

最新资源