双向长短期记忆架构：改进的三元组嵌入在人重识别中的应用

6 浏览量更新于2024-08-28 收藏 469KB PDF 举报

本文主要探讨了"双向长短期记忆（Bi-directional Long Short-Term Memory, Bi-LSTM）架构在人像重识别任务中的应用，特别是在修改后的三元组嵌入方法中。作者Wei Lin Zhong、Huilin Xiong、Zhen Yang和Tao Zhang来自上海交通大学电子与信息工程学院以及传感与导航研究所，他们针对现有的深度学习方法在处理人体部分特征时存在的问题进行改进。在人像重识别领域，匹配同一个人在不同不重叠摄像头下的图像是一项具有挑战性的任务，因为同一人不同角度、光照和遮挡条件下的图像存在显著的内在变化。传统基于身体部位的深度学习方法往往简单地将空间部分的特征或得分相连接，而忽视了这些特征之间复杂的空间关系。这种方法可能导致视觉信息的丢失和模型性能受限。作者提出了一种创新的Bi-LSTM架构，它能够顺序处理空间部分，同时利用双向信息流动，使得不同部分的信息可以相互传递。这种双向连接不仅提高了对空间信息的处理效率，而且通过LSTM内部的门控机制，更好地捕捉和融合上下文视觉信息。相比于单向LSTM，双向设计有助于捕捉更全面的特征表示，从而提升人像重识别的准确性和鲁棒性。论文的核心贡献在于： 1. **双向长短期记忆网络**：通过引入双向连接，Bi-LSTM能够同时考虑前后时间步的信息，增强特征表示的完整性和深度理解。 2. **修改后的三元组嵌入**：通过优化的三元组损失函数，结合Bi-LSTM的特性，提高在训练过程中对相似和不相似样本的区分能力。 3. **空间和上下文信息的高效建模**：利用LSTM的内部门控机制，有效整合局部空间特征和全局上下文，减少误匹配的可能性。 4. **实验验证与评估**：论文通过一系列实验展示了新方法在多个公开数据集上的优越性能，证明了其在解决人像重识别问题上的有效性。总结来说，这篇研究论文提出了一种新颖的深度学习框架，通过双向长短期记忆架构改进了人像重识别任务中的特征提取和匹配过程，旨在提高识别准确性和对抗复杂环境变化的能力。这对于提高监控系统中的人脸识别性能具有重要意义，也为后续研究者提供了新的思路和技术参考。

BI-DIRECTIONAL LONG SHORT-TERM MEMORY ARCHITECTURE FOR PERSON

RE-IDENTIFICATION WITH MODIFIED TRIPLET EMBEDDING

Weilin Zhong, Huilin Xiong, Zhen Yang, Tao Zhang

School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, China

Institute for Sensing and Navigation, Shanghai Jiao Tong University, China

ABSTRACT

Matching a specific person across non-overlapping cameras,

known as person re-identification, is an important yet

challenging task owing to the intra-class variations of the

images from the same person in pose, illumination, and

occlusion. Most existing body-parts based deep methods

simply concatenate the features or scores obtained from

spatial parts and ignore the complex spatial correlation

between them. In this paper, we present a bi-directional

Long Short-Term Memory (Bi-LSTM) architecture that can

process the spatial parts sequentially, and enable the

messages of different parts to go through in a bi-directional

manner. Therefore, the spatial and contextual visual

information can be modeled efficiently by the bi-directional

connections and the internal gating function in LSTM.

Furthermore, we propose a modified triplet loss to learn

more discriminative features to distinguish positive pairs

from negative pairs. Experiments on CUHK01 and

CUHK03 datasets are carried out to demonstrate the

effectiveness of the proposed method.

Index Terms— bi-directional information flow, spatial

correlation, Long-Short Term Memory, modified triplet

1. INTRODUCTION

Person re-identification aims to identify a specific person

among a large number of images obtained across multiple

non-overlapping cameras. Recent years, it has drawn

increasing attention due to its important and broad

applications in visual surveillance. However, person re-

identification is also a challenging task because of the large

variations of the images from the same person in pose,

illumination and background occlusion.

Basically, person re-identification involves two aspects of

computation, that is: i) extraction of discriminative features;

ii) similarity metric learning. Hand crafted features [1-4] are

designed to be robust to illumination change and variations

of appearance caused by different camera views. Metric

learning methods based on Mahalanobis distance [5-10] are

shown to be effective in matching person images to separate

the positive pairs from negative pairs.

Camera a

Camera b

(a) Examples in CUHK03 [12]

(b) Bi-directional information flow

Fig.1. (a) Images in the same row come from the same

person. Persons undergo large variations across non-

overlapping camera, whereas share similar appearance in the

same camera view. Thus triplets with different hard-level

caused by camera view should be treated differently. (b)

Spatial information can be either passed from top to bottom

(Red arrow) or from bottom to top (Green arrow) to verify

whether they are the same person.

With the recent advance of deep learning methods in

various pattern recognition applications, researchers also

develop new deep learning architectures [11-14] based on

Convolutional Neural Networks (CNNs) to handle the

person re-identification task, in which the feature

representation and metric learning are usually jointly learned.

However, most of the existing deep methods take the whole

image as input [13, 15, 17] and focus only on the global

information. As a consequence, the performances of such

approaches may still suffer from such factors as illumination

variance and occlusion. Inspired by the success of the spatial

stripe representation in the hand-crafted features exaction [1,

4], several deep methods are proposed to concentrate on

local regions or body-parts [11, 16]. However, simply

concatenating features or scores obtained from body-parts

and viewing different parts independently do not work well

for person re-identification. Recently, Rahul Rama Varior

[13] propose a siamese Long Short-Term Memory (S-LSTM)

architecture, aiming to enhance the discriminative capability

of feature representation such as LOMO feature [1].

Motivated by S-LSTM [13] and the latest deep methods

in person re-id [11, 12, 17], we present a bi-directional Long

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38631738

粉丝: 4
资源: 971

双向长短期记忆架构：改进的三元组嵌入在人重识别中的应用

LSTM（Long Short-Term Memory）长短期记忆网络

Meeting the design requirements for Bi-Directional-Capable OBC

Bi-directional attention flow

A Bi-Directional DC-DC Converter.rar_DC/DC_Dual Active_bi-direct

Pluto Bi-Directional Comm library-开源

A bi-directional flow-rack automated storage and retrieval system for unit-load warehouses

4-Switch Buck-Boost Bi-directional DC-DC Converter

Rumor Detection on Social Media with Bi-Directional Graph 代码

Rumor Detection on Social Media with Bi-Directional Graph pdf

最新资源