Chen X A, et al. Sci China Inf Sci
sparse and low-rank decomposition via Principal Component Pursuit based on the assumption that a
batch of aligned images should form a low-rank matrix. Suppose I
0
1
,...,I
0
n
∈ R
w×h
are n well-aligned
grayscale images of some objects. The function vec : R
w×h
→ R
m
stacks each of above images as a
vector. Then the matrix
L =[vec(I
0
1
)|···|vec(I
0
n
)] ∈ R
m×n
(1)
should be approximately low-rank. This kind of assumption is very common. For instance, Ref. [20]
assumes that a rank-9 approximation suffices, when the images I
0
i
,i =1,...,n are obtained from some
Lambertian objects under varying illumination.
However, in practice, the object in image i is often partially occluded or corrupted by some noises
ε
i
. So it may be more appropriate to assume that we observe I
1
= I
0
1
+ ε
1
,...,I
n
= I
0
n
+ ε
n
instead of
I
0
1
,...,I
0
n
. Thus, the original data D can be represented as
D = L + S. (2)
Here, S is the noise matrix and is generally regarded sparse.
The above model explicitly requires the original data to be well aligned. For real data, in order to
compensate for the misalignments, certain transformation τ =[τ
1
|···|τ
n
] ∈ R
p×n
has been applied to
act on the original data. Then Eq. (2) can be changed to
D ◦ τ = L + S, (3)
where D ◦ τ means applying the transformation τ
i
to each misaligned image I
i
.
When taking both the corruption and misalignment into consideration, the Lagrange form of the batch
image alignment model can be formulated as follows:
min
L,S,τ
rank(L)+
λS
0
s.t. D ◦ τ = L + S
. (4)
Here, the
0
-norm ·
0
counts the number of nonzero entries in the sparse error matrix S.
The above optimization problem (4) is NP hard and thus cannot be directly tractable. Based on the
Robust PCA, the highly nonconvex objective function in (4) is relaxed to its convex surrogate, i.e. replace
the rank and
0
-norm with the nuclear norm and
1
-norm, respectively. Then the model can be modified
as follows:
min
L,S,τ
L
∗
+ λS
1
s.t. D ◦ τ
= L + S. (5)
In [21], the model of (5) is successfully extended for unsupervised subspace discovery. The main
goal is to deal with a much less constrained problem, in which the common pattern (object) observes
large-scale differences at unknown locations in the images. It is a learning and matching algorithm, but
has nothing to do with the image alignment. Ref. [22] proposes a new method for face alignment and
recognition by integrating sparse representation based classification (SRC) [23] to the model of (5), which
have equivalently good performance on the recognition rate. Ref. [24] proposes coupling alignments with
recognition for still-to-video face recognition based on the theory of sparse representation and subspace
segmentation. From a different aspect, Ref. [25] applies the form of model (5) to generate transformation
invariant low-rank textures (TILT), in which the rectified image(s) is a single image instead of image
sequences based on the assumption that the textures of the rectified image are usually symmetric patterns
and consequently form low-rank matrices.
In these studies, the low-rank component is required to be exactly low-rank and the sparse component
are required to be exactly sparse as in [18]. What is more, though solving (5) could obtain good image
alignment results in many situations, there are some specific applications in which convex relaxation
based approach (5) cannot provide desirable results. Therefore, it would be better to replace the convex
nuclear norm and
1
norm with folded-concave penalties to remedy the drawbacks of convex penalization
method.