2:4 F.Liuetal.
feature space is no longer critical as long as the test image can be approximated by a
sparse linear combination of the training images. In Wright et al.’s pioneer work, a test-
ing sample is first coded as a sparse linear combination of all the training samples via
l
1
-norm minimization. Then the testing face image is classified to the class that yields
the least representation error. The experimental results show that SRC with random
projections-based features can outperform a number of conventional face recognition
schemes, such as the nearest-neighbor classifier with Fisherface and Laplacianfaces-
based features. However, SRC requires a rich set of training samples and the correct
sparse solution can be recovered only when the number of training samples is suffi-
ciently larger than the dimensionality of features. To fulfill this requirement, Wagner
et al. [2009] designed a system that acquires tens of images of each subject to cover
all possible illumination changes. Recently, Zhang et al. [2011] showed that it is the
collaborative representation (CR) mechanism rather than l
1
-norm sparsity that truly
improves the FR accuracy. Consequently, they proposed CRC_RLS, which has signif-
icantly less complexity than SRC but leads to very competitive results. I n addition,
one new method was proposed to learn data representation by jointly considering the
structure information and sparsity in Li et al. [2014].
The performance of the methods discussed earlier is heavily affected by the number
of training samples for each person. Specifically, if there is only one training sample per
person, some methods even fail to work because the intra-personal variations cannot
be estimated at all. This is the so-called single-sample-per-person (SSPP) problem [Tan
et al. 2006] in face recognition. In order to address the SSPP problem, many methods
have been developed during the last two decades. Shan et al. [2003] presented a face-
specific subspace method based on PCA that first generates a few virtual samples from
a s ingle gallery image per subject and then uses PCA to build a projection subspace
for each person. But strong correlation between virtual samples decreases the repre-
sentativeness of the training samples and accordingly limits the performance of this
method. In order to make LDA suitable for the SSPP problem, Gao et al. [2008] applied
SVD decomposition to the only face image of a person and the obtained nonsignificant
SVD basis images were used to estimate the within-class scatter matrix of t his per-
son approximately. However, the optimal number of nonsignificant SVD basis images
is face-specific and should not be determined equally for all face images as they did.
Considering the similarity of face images across individuals, a few generic learning
methods have been proposed to solve the SSPP problem, which use a generic train-
ing set to extract discriminatory information. For example, Su et al. [2010] proposed
an Adaptive Generic Learning (AGL) method, which adapts a generic discriminant
model to better distinguish the persons with a single sample. Yang et al. [2013] pro-
posed the spare variation dictionary learning (SVDL) scheme by using the relationship
between the gallery set and the external generic set. Recently, Deng et al. [2014] pro-
posed a novel generic learning method by mapping the intraclass facial difference of
the generic f aces to the zero vectors to further enhance the generalization capability
of their proposed linear regression analysis (LRA). They also proposed the extended
sparse representation-based classifier (ESRC) [Deng et al. 2012] to solve the SSPP
problem, which applies an auxiliary intraclass variant dictionary to represent possible
variation between the training and testing images.
All these methods for the SSPP problem treat the whole image as a high-dimensional
vector and belong to holistic representation-based methods. However, some other
schemes favor local representation, in which a face image is divided into blocks and vec-
tor representation of information is conducted block by block rather than globally. For
example, Chen et al. [2004] proposed the BlockFLD method, which generates multiple
training samples for each person by partitioning each face image into a set of same-
sized blocks, and then applies FLD-based methods with these blocks. However, the
ACM Transactions on Intelligent Systems and Technology, Vol. 7, No. 1, Article 2, Publication date: September 2015.