BI-DIRECTIONAL LONG SHORT-TERM MEMORY ARCHITECTURE FOR PERSON
RE-IDENTIFICATION WITH MODIFIED TRIPLET EMBEDDING
Weilin Zhong, Huilin Xiong, Zhen Yang, Tao Zhang
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, China
Institute for Sensing and Navigation, Shanghai Jiao Tong University, China
ABSTRACT
Matching a specific person across non-overlapping cameras,
known as person re-identification, is an important yet
challenging task owing to the intra-class variations of the
images from the same person in pose, illumination, and
occlusion. Most existing body-parts based deep methods
simply concatenate the features or scores obtained from
spatial parts and ignore the complex spatial correlation
between them. In this paper, we present a bi-directional
Long Short-Term Memory (Bi-LSTM) architecture that can
process the spatial parts sequentially, and enable the
messages of different parts to go through in a bi-directional
manner. Therefore, the spatial and contextual visual
information can be modeled efficiently by the bi-directional
connections and the internal gating function in LSTM.
Furthermore, we propose a modified triplet loss to learn
more discriminative features to distinguish positive pairs
from negative pairs. Experiments on CUHK01 and
CUHK03 datasets are carried out to demonstrate the
effectiveness of the proposed method.
Index Terms— bi-directional information flow, spatial
correlation, Long-Short Term Memory, modified triplet
1. INTRODUCTION
Person re-identification aims to identify a specific person
among a large number of images obtained across multiple
non-overlapping cameras. Recent years, it has drawn
increasing attention due to its important and broad
applications in visual surveillance. However, person re-
identification is also a challenging task because of the large
variations of the images from the same person in pose,
illumination and background occlusion.
Basically, person re-identification involves two aspects of
computation, that is: i) extraction of discriminative features;
ii) similarity metric learning. Hand crafted features [1-4] are
designed to be robust to illumination change and variations
of appearance caused by different camera views. Metric
learning methods based on Mahalanobis distance [5-10] are
shown to be effective in matching person images to separate
the positive pairs from negative pairs.
Camera a
Camera b
(a) Examples in CUHK03 [12]
(b) Bi-directional information flow
Fig.1. (a) Images in the same row come from the same
person. Persons undergo large variations across non-
overlapping camera, whereas share similar appearance in the
same camera view. Thus triplets with different hard-level
caused by camera view should be treated differently. (b)
Spatial information can be either passed from top to bottom
(Red arrow) or from bottom to top (Green arrow) to verify
whether they are the same person.
With the recent advance of deep learning methods in
various pattern recognition applications, researchers also
develop new deep learning architectures [11-14] based on
Convolutional Neural Networks (CNNs) to handle the
person re-identification task, in which the feature
representation and metric learning are usually jointly learned.
However, most of the existing deep methods take the whole
image as input [13, 15, 17] and focus only on the global
information. As a consequence, the performances of such
approaches may still suffer from such factors as illumination
variance and occlusion. Inspired by the success of the spatial
stripe representation in the hand-crafted features exaction [1,
4], several deep methods are proposed to concentrate on
local regions or body-parts [11, 16]. However, simply
concatenating features or scores obtained from body-parts
and viewing different parts independently do not work well
for person re-identification. Recently, Rahul Rama Varior
[13] propose a siamese Long Short-Term Memory (S-LSTM)
architecture, aiming to enhance the discriminative capability
of feature representation such as LOMO feature [1].
Motivated by S-LSTM [13] and the latest deep methods
in person re-id [11, 12, 17], we present a bi-directional Long
1562978-1-5090-2175-8/17/$31.00 ©2017 IEEE ICIP 2017