schema to update the support and learn a new map-
ping function from previous support LR/HR patch pairs.
Compared with previous face image SR work, our con-
tribution can be summarized as follows:
Compared with those coding-based methods [1,2,22–25],
which use the strong regularization of “same representa-
tion” for learning, we relax the “same representation” to
“same support”, giving more flexibi lity to the learned
mapping function.
Instead of learning a global mapping function from the
entire training samples as in regression-based methods
[18,26–28], we design to learn the specific mapping
function for each observation (one input LR patch) from
its support LR/HR patch pairs, and thus the learned
mapping function can be tuned towards a specific input
LR patch.
Compared with those regression-based methods
[18,26–28], which ignore the geometry of the HR patch
space, we define the support set by the geometry of the
HR patch space and use the geometry to regularize the
mapping function. With an iterative optimization tech-
nology, the proposed method can produce more
detailed face features step by step.
Note that we previously proposed a regression-based
method, namely Manifold regularized Sparse Support
Regression (MSSR), for general image SR [35]. Although
MSSR and the proposed method all try to learn the
mapping relationship between the LR patches and HR
ones on the support, they have some essential differences.
In particular, MSSR defined the support set of the input LR
patch with these LR training patches with non-zero sparse
coding coefficients. However, due to the fact that many HR
images may correspond to one LR image, the neighbor-
hood relationship of the LR space cannot reflect the truth.
To this end, instead of defining the support set in the LR
image patch space as in MSSR, we obtained the support set
in the HR image patch space (using the estimated HR patch
and leading to HR-LiSR), whose geometry is much more
credible and discriminant than that of the LR image patch
space [31]. Since the target HR patch is unknown in
advance, we formulate the target HR patch SR as an
iterative optimization problem (while the support set
and the mapping function are learned in one time in the
MSSR method). Therefore, the super-resolved results can
be refined step by step. In addition, MSSR aims at super-
resolving the general scene and does not consider the
prior of face image, while LiSR is specially designed for
facial image. Through incorporating the face position prior
(all face images have similar structures and the patches at
the same site are highly related once we align the face
images according to the positions of two eyes), LiSR
establishes model for each position patch but for the entail
face image, thus leading to more flexible SR framework.
1.3. Organization of this paper
The rest of the paper is organized as follows. The details
of the proposed LiSR approach are presented in Section 2.
Comparative results are reported in Section 3 and a brief
discussion is given in Section 4. Finally, we give concluding
remarks and future prospects in Section 5.
2. The proposed algorithm
In this section, we present the detailed procedure of the
proposed approach. W e begin with the terms and notations.
As stated, the problem of face image SR is formulated as the
inference of the HR face image y
t
from an input LR face image
x
t
, given the training sets of HR and LR face images,
Y ¼fy
m
g
M
m ¼ 1
and X ¼fx
m
g
M
m ¼ 1
,whereM denotes the size
of the training sets. As in many face image SR approaches [23–
25], we represent each face by image patches. Therefore, each
face image mentioned above is divided into N small over-
lapping patch sets fy
m
ði; jÞj1r ir U; 1r jr Vg
M
m ¼ 1
and
fx
m
ði; jÞj1r i r U; 1r jr Vg
M
m ¼ 1
,thepatchnumberofeach
face image is calculated by N¼ UV, U denotes the patch
number in every column, V denotes the patch number in
every row , and the term (i,j) indicates the coordinate in the
patch coordinate system ouv, as illustrated in Fig. 3.
For one input LR face image denoted in patches as
fx
t
ði; jÞj1r ir U; 1r jr Vg, the face image SR approaches
super-resolve each input LR patch x
t
ði; jÞ to obtain its HR
version y
t
ði; jÞ. Concatenating and integrating all the super-
resolved HR patches fy
t
ði; jÞj1r ir U; 1rjr Vg according
to their corresponding positions, we can obtain a face
image, which is the target HR face image y
t
.
Specially, the coding-based approaches encode the
input LR patch on the LR training patches of the same
position by a linear combination of neighbors, thus obtain-
ing the coding coefficients:
^
θ ¼ arg minf‖x
t
ði; jÞXði; jÞθ‖
2
2
þλEðθÞg; ð1Þ
where Xði; jÞ is a matrix with its columns being training
patches, Xði; jÞ¼½x
1
ði; jÞ; x
2
ði; jÞ; ⋯; x
M
ði; jÞ, EðθÞ is a prior of
the coding coefficients, which enforces the special
Fig. 3. Dividing a face into N¼UV patches. The term ði; jÞ indicates the
coordinate of one patch in the patch coordinate system o–uv. patch_size
and overlap denote the side pixels of one square patch and the overlap
pixels between patches respectively.
J. Jiang et al. / Signal Processing 103 (2014) 168–183 171