classification task. In comparison with the generative
trackers, the discriminative trackers can make effect-
ive use of surrounding information that can distin-
guish a target from the background easily.
In the past few years, the discriminative trackers
attempting to construct a classifier to distinguish a
target from the background have gradually become
the mainstream. Generally, these trackers can be
divided into two steps: feature representation and
classifier training or updating. Actually, no matter
whether it is a generative tracker or a discriminant
tracker, feature representation plays a significant
role. For the discriminative trackers, the feature
denoting the difference between the target and back-
ground should be robust and discriminative with
respect to variations in both the intrinsic and extrinsic
environments.
14
The classifier of discriminative track-
ers must constantly update online to face the intrinsic
and extrinsic challenges.
15–17
Therefore, designing an
excellent feature representation scheme and construct-
ing a suitable online update mechanism for the clas-
sifier are crucial for discriminative trackers.
The features used for tracking tasks can be classi-
fied into three levels: primary features, intermediate or
handcrafted features, and advanced or deep features.
Primary features, for example, edges, contours, and
color information, are ubiquitous and widely used
in tracking tasks.
18–20
Intermediate or handcrafted
features designed by using a priori knowledge of a
specific target have discriminative abilities that can
distinguish the target from the background, which
have achieved great success in object detection and
recognition. For example, histogram of oriented gra-
dient (HOG) is specifically designed to extract the
features of pedestrian and vehicle.
21,22
In advanced
features, the most representative one is convolutional
neural network (CNN) features, which are the outputs
from different layers of the pretrain CNN and has
shown strong advantages with good generalization
and migration ability.
23
Furthermore, the deep fea-
tures usually contain more semantic information
that is ideal for classification and recognition.
However, with high computational complexity, deep
feature does not fully meet real-time requirements.
Currently, as one of the main developmental dir-
ections of discriminative trackers, the trackers in
correlative filtering framework have a very competi-
tive performance and have attracted a great deal of
attention.
24–26
The success of correlation filtering is
due to two advantages. One is that the correlation
filtering conveniently expands the number of the
training samples for filter update in each frame by
using cyclical sampling. The other is that the correl-
ation filtering has computational efficiency since it
converts correlation operations in the spatial
domain into element-level operations in the frequency
domain by using the fast Fourier transform. These
two advantages make the correlation filtering particu-
larly appropriate for tracking problems.
In 2010, Bolme et al. introduced the idea of correl-
ation filtering into the tracking tasks and proposed
the minimum output sum of squared error
(MOSSE).
27
As a prototype of the correlation filter-
ing, MOSSE can obtain a stable correlation filter in
accordance with only one single frame image due to
the first advantage of the correlation filtering. In add-
ition, the processing speed of MOSSE can reach 669
frames per second (FPS), which significantly exceeds
the other trackers. In 2012, circulant structure and
kernel (CSK) were introduced into the correlation
filtering framework.
28
In 2015, Henriques et al.
proposed the kernelized correlation filters (KCFs)
24
which extracts the intermediate feature
(Felzenszwalb’s histogram of oriented gradient
(FHOG)) to replace the primary features (gray fea-
ture) of CSK and achieves a better performance. In
order to further solve the boundary effect, Danelljan
et al. proposed the Spatially Regularized
Discriminative Correlation Filters (SRDCF) which
adds a regularization term of the spatial domain to
the objective function used for the classifier training.
29
As an improved version of SRDCF, DeepSRDCF
replaces the FHOG features with the output of
single convolutional layer of CNN.
30
In 2016,
Danelljan et al. proposed a Continuous Convolution
Operators for Tracking (CCOT), which uses a multi-
layer features to replace the single-layer feature of
DeepSRDCF.
31
In 2017, in order to simplify the
high-dimensional feature extraction of CCOT,
Danelljan et al. further proposed a new promising
Efficient Convolution Operators.
32
From the develop-
ment of the correlation filtering, we can find that fea-
ture representation plays a significant role and is one
of the main directions of algorithm improvement.
In the past few years, a number of algorithms spe-
cific to TIR pedestrian tracking have been proposed.
Different from the pedestrian trackers of visible
images that usually uses color and texture features,
a TIR pedestrian tracker often uses intensity features
since in most instances the temperature of a target is
higher than its background in TIR images. However,
when two similar targets are crossing each other,
intensity features derived from temperature difference
may lead to failure in tracking. Therefore, fusion of
multiple features for tracking has been received more
and more attention. Wang and Tang
33
propose a fea-
ture representation for tracking by combining inten-
sity and edge information, while Ko et al. use local
intensity distribution and texture features (oriented
center symmetric local binary patterns) to represent
the pedestrian.
34
Moreover, some classic features,
for example, FHOG, speeded up robust feature and
region of interest histogram,
35
are frequently used in
TIR pedestrian tracking. In this paper, we use FHOG
and normalized intensity (normalized grayscale of the
pixel) to construct a fused feature representation in
TIR pedestrian tracking based on the response maps
of correlation filter.
Ding et al. 6091