JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 4, NO. 5, APRIL 2015 3
indexed features Φ(I
i
, S
t−1
i
) and adaptively constrain the inter-
mediate shapes by exemplar shapes during cascade regression.
arg min
R
t
M
X
i=1
S
∗
i
− Ψ
D
S
α
t−1
i
, γ
t−1
i
−R
t
W
F
i
Φ
I
i
, Ψ
D
S
α
t−1
i
, γ
t−1
i
2
2
,
where W
F
i
= diag(w
F
i1
, w
F
i2
, . . . , w
F
iN
, 1), w
F
ij
= w
ij
I
P ×P
,
w
ij
indicates the occlusion level for the landmark j, and P is
the dimension of the landmark feature vector. The goal of W
F
i
is to allocate high weights to the uncorrupted landmarks and
small or even zero weights to the occluded landmarks, which
can efficiently restrain the influence of the occluded regions.
Ψ
D
S
α
t−1
i
, γ
t−1
i
is the output of the exemplar-based shape
constraint on S
t−1
i
. D
S
∈ R
2N×M
is the exemplar shape
dictionary, α
t−1
i
∈ R
M×1
is the reconstruction coefficient,
Ψ is the similarity transformation, and γ
t−1
i
is the similarity
transformation coefficient.
A. Adaptive exemplar-based shape model
The sparse shape model is widely used as a shape constraint
because it is able to correct gross errors of the input shape
and preserve shape details, even if they are not statistically
significant in the training set [23], [24].
min
α
kS − D
S
αk
2
2
+ λkαk
1
, (3)
where S = (x
1
, . . . x
N
, y
1
, . . . y
N
)
T
is the observed normal-
ized shape, D
S
∈ R
2N×M
is the normalized shape dictionary,
α ∈ R
M×1
is the sparse reconstruction coefficient, and λ is
the regularization parameter. The sparse shape model has two
drawbacks, however: 1) It is very time-consuming due to the
high dimension l
1
optimization problem, i.e., so-called Lasso
optimization [25]; 2) All the landmark points in the shape
are treated equally, including the corrupted landmarks [23],
[24]. If some landmarks are occluded, the errors from the
misaligned landmarks will spread to all the other landmarks
to some extend, due to the residual minimization used in the
procrustes analysis [43] and shape reconstruction [44].
As can be seen from Fig. 1, there is an occlusion above the
left eye which causes the alignment result on the corresponding
area to be inaccurate. We take the normalized face shapes
from the HELEN [45] training set as the shape dictionary
and give two sparse reconstruction results under difference
regularization parameters. When we set λ as 0.001, the number
of non-zero coefficients is 20, and the gross error on the
left eye is removed. However, the gross error on the left
eyebrow is still there. As we increase λ to 0.05, the constrained
shape is smoother, but the accuracy of the non-occluded area
is sacrificed. If we decrease λ, the constrained shape will
be almost the same as the input face shape without the
ability to correct the gross error, due to the minimization of
the reconstruction error. Thus, sometimes the sparse shape
constraint has limited ability in fixing the gross error.
To overcome the above issue, we propose a new exemplar-
based shape model as,
min
α
W
S
S −
W
S
S W
S
D
S
α
2
2
, (4)
where W
S
= diag(w
1
, . . . , w
N
, w
1
, . . . , w
N
) is the weighting
matrix, and w
j
is occlusion level. The purpose of W
S
is to
evaluate the input shape with non-occluded landmarks as accu-
rately as possible, and selects the most important k-nearest
exemplar shapes from the shape dictionary.
W
S
S W
S
D
S
is used to select the nearest exemplar shapes of W
S
S from the
dynamic exemplar shape dictionary W
S
D
S
. In other words,
W
S
and impose different weights on the row and column
respectively of the exemplar shape dictionary, which makes
the shape constraint more flexible, robust and efficient.
Compared to time-consuming Lasso optimization, the pro-
posed adaptive exemplar-based shape model is more efficient.
Given the shape dictionary D
S
∈ R
2N×M
, the computational
complexity of the interior-point convex optimization solver
for l
1
optimization problem is O
N
2
M
. The computational
complexity of our method lies only in the selection of the
K-Nearest Neighbors, and the reconstruction coefficients are
computed directly by the least squares method. In Lasso
optimization, the sparse coefficients tend to be local and
items with larger coefficients are more similar to the input
sample [46], [47]. In the proposed adaptive shape model, we
select the k-Nearest exemplar shapes of the input shape as the
shape bases for reconstructing the input shape directly and set
the coefficients corresponding to the remaining exemplars in
the dictionary as zero. This means that the input face shape can
be reconstructed by the nearest exemplar shapes [22], which is
feasible both in empirical observation and theory analysis [22],
[48].
As shown in Fig. 2, we can only take account of the
non-occluded landmarks when the occlusion levels of the
landmarks are given, and we use the nearest exemplar-based
shape model to reconstruct the shape by Eq 4. Surprisingly,
even though we still reconstruct the face shape using only 20
exemplar face shapes, as in Fig. 1, the result is very satisfying.
The gross error on the occluded area is almost removed, and
the landmarks on the non-occluded area are as accurate.
B. Occlusion inference model
It is well known that it is difficult to detect occluded land-
marks if only the local features around the landmarks are used.
However, occlusion is generally a continuous local region with
an irregular size and becomes obvious in the shape-normalized
appearance [15]. Inspired by [34], the facial shape and ap-
pearance tend to be consistent on the exemplar dictionary.
We construct the shape-normalized appearance dictionary D
A
,
which is directly derived from the exemplars D
S
. The shape-
indexed appearance is then constructed by D
A
β, β ∈ R
M×1
on the exemplar appearance dictionary, and the reconstruction
discrepancy kA − D
A
βk
2
2
is taken to estimate the occlusion
levels w
j
, j = 1, · · · , N of the landmarks and detect the
occluded landmarks.
To effectively calculate the appearance reconstruction coeffi-
cients β, we utilize Canonical Correlation Analysis (CCA) [49]