ZHANG et al.: ROBUST VISUAL TRACKING USING STRUCTURALLY RP AND WLS 1751
Fig. 1. Overview of the proposed tracking method.
minimization problem, the run time is determined by the
product of the total number of all the PCG steps with all the
iterations and the cost of each PCG step. The total number of
the PCG iterations depends on the value of the regularization
parameter λ. In the experiments with λ = 0.15, the total
number of PCG is approximately a few hundred times. For
a PCG step, the most expensive operator is a matrix–vector
product, which has O (d
2
+d ×n) computational complexity,
where d is the feature dimensionality and n is the number of
the templates. Motivated by the sparse signal recovery power
of compressive sensing (CS), Li et al. [31] accelerated the
1
-norm minimization by reducing the feature dimensionality
using a hash table or RP which meets the restricted isometry
property (RIP) [41]. Let ∈ R
˜
d
×d
be the projection matrix,
the coefficients α can be computed by
α = arg min
α
y −Xα
2
2
+ λα
1
. (5)
When we set
˜
d d, the dimensionality of the
1
minimization
is significantly reduced while the original high
dimensionality y can still be fully recovered from the
reduced y.
To compute the coefficients shown in (2), we need com-
putationally expensive
1
-norm minimization. However, the
particle weights defined in (3) generate a reconstruction
error measured in
2
-norm, which has a lower bound
y −Fα
F
2
2
≥y − F ˆα
F
2
2
,where
ˆα
F
= arg min
α
y − Fα
F
2
2
. (6)
Instead of reducing the computational complexity of the
1
-norm minimization, Mei et al. [30] proposed to reduce the
number of
1
-norm minimization by excluding unimportant
particles using the reconstruction error bound computed via
fast
2
-norm minimization shown in (6).
The aforementioned methods employ sparse representation
to globally encode each target candidate through the target
templates. In the literature, there are also different kinds of
methods [35], [42], which used local sparse representation
to model target appearance. These methods first construct a
dictionary from the local patches sampled from the training
images that contain the tracked target, and then use the
dictionary to encode local patches sampled from each target
candidate or template. The coding coefficients are used as
features to describe the appearance of the target candidate
or template. However, due to the locality of the sampling,
these appearance modeling methods have a poor discriminative
ability. To overcome this disadvantage, Zhong et al. [43]
integrated both local and global sparse appearance models.
In [44] and [45], structural sparse appearance modeling
was proposed, which exploited the spatial layout of the
locally sampled patches to increase the discriminative ability.
In [46], a more sophisticated method was proposed, where
discriminative sparse coding was directly used to enhance the
discriminative power of the resulting coding coefficients.
III. P
ROPOSED METHOD
In this section, we present the proposed tracking method
based on structurally random mapping and WLS. In contrast
to
1
trackers which only use target templates, our pro-
posed framework uses both target and background templates
to represent each candidate. When the total reconstruction
error is minimized, the target and the background templates
compete against each other in the linear representation. After
reducing the feature dimensionality using structurally random
mapping, we compute the representation coefficients by the
WLS technique. The reconstruction errors obtained by the
target and the background templates are used to discriminate
the target from its background. An overview of the proposed
tracking method is shown in Fig. 1.
A. Tracking Framework
The proposed method is implemented using a sequential
importance sampling (also known as particle filter) frame-
work [38], [39], which is a popular computation method
to recursively approximate the posterior distribution of state
variables characterizing a dynamic system. It consists of two
stages: prediction and updating. Let z
t
and I
t
be the state
variables and the observation at time t, respectively. The
posterior distribution of z
t
given all the available observations
I
1:t−1
={I
1
, I
2
,...,I
t−1
} up to time t − 1 can be predicated
using the state transition model p(z
t
|z
t−1
) as
p(z
t
|I
1:t−1
) =
p(z
t
|z
t−1
) p(z
t−1
|I
1:t−1
)dz
t−1
. (7)
At time t, the observation I
t
is available, and the posterior
distribution of z
t
is updated using the Bayes rule as
p(z
t
|I
1:t
) =
p(I
t
|z
t
) p(z
t
|I
1:t−1
)
p(I
t
|I
1:t−1
)
. (8)
Using the sequential importance sampling technique, the
posterior distribution p(z
t
|I
1:t
) is approximated by a set of
N weighted samples (also called particles) {z
i
t
,w
i
t
}
i=1,...,N
,
where w
i
t
are the importance weights of particles z
i
t
.Let
q(z
t
|I
1:t
, z
1:t−1
) be the importance distribution from which