devices or environments because large appearance gap usu-
ally occurs. How to extract common properties and reduce
this gap is the key challenge in heterogenous face recognition.
Recently, a variety of feature learning methods [24], [25], [26]
have also been proposed for heterogenous face recognition.
For example, Jin et al. [26] learned representative features by
training a coupled filters which maximizes the inter-class var-
iations and minimize the intra-class variations. Saxena and
Verbeek [24] used a CNN model with a shared layer learned
from a soft-max criterion to obtain common features.
Yi et al. [25] extracted Gabor features from face landmarks
and performed shared representation learning to reduce the
modality gap. Differently, in this work, we learn specific and
common latent spaces to obtain similar information and
exploit specific complimentary information, respectively.
2.2 Binary Code Learning
A variety of binary code learning methods have been pro-
posed in recent years [27], [28], [29]. For example,
Weiss et al. [29] proposed a binary coding learning
approach for image search. Norouzi et al. [30] improved it
by using a triplet ranking loss optimization criterion. How-
ever, most existing binary code learning methods are devel-
oped for scalable similarity search [28]. While binary
features such as LBP and Haar-like features have been used
in face recognition, most of them are hand-crafted. There
have been some recent work which employs binary code
learning for face representation and recognition [31], [32],
[33]. For example, Zhang et al. [32] and Rastegari et al. [33]
learned binary codes based on variants of the fisher crite-
rion. However, these binary codes are learned holistically
and not in feature level. More recently, Lu et al.[31] intro-
duced a compact binary feature descriptor (CBFD) which
learned binary face descriptors at the feature level. How-
ever, CBFD performed feature and codebook learning sepa-
rately, so that some useful information for codebook
learning may be compromised in the binarization stage.
3PROPOSED APPROACH
In this section, we first review the LBP method and present
the proposed SLBFLE method. Then we show how to use
SLBFLE for face representation. Lastly, we present the pro-
posed C-SLBFLE method for heterogenous face recognition.
3.1 Review of LBP
LBP is an effective feature descriptor in face recognition [5].
For each pixel in face image, LBP first computes the
difference between the central pixel and the neighboring
pixels and binarizes the difference with a fixed threshold.
Second, these binary bins are encoded as a real value by
using a hand-crafted pattern coding strategy. Fig. 2 illus-
trates the basic idea of LBP, where two individual stages are
used for feature representation.
There are two shortcomings in LBP: 1) both the binar-
ization and feature encoding stages are hand-crafted,
which are not optimal f or local feature representation; 2)
a two-stage procedure is used in LBP, which is not effec-
tive enough because s ome useful info rmation for code-
book learning may be compromised in the binarizatio n
stage. To address this, we propose a SLBFLE method to
learn a discriminative mapping and a compact codebook
for f eatu re mapping and encoding jointly, so that more
data-adaptive information can be exploited in the learned
features. The following describes the details of the pro-
posed method.
3.2 SLBFLE
As aforementioned, our SLBFLE aims to jointly learn a fea-
ture mapping and a dictionary for feature mapping and
encoding. While our SLBFLE method is unsupervised, it
still has strong discriminative power because raw pixels are
extracted from face images of different identities which con-
tribute to learning a discriminative feature mapping. More-
over, the learned binary codes can well describe how pixel
values change over local patches and implicitly encode
important visual patterns such as edges and lines in face
images. Also, the learned dictionary can well encode the
learned binary codes so that some noisy information can be
well alleviated.
Let X ¼½x
1
; x
2
; ...; x
N
2R
dN
be a set of N face image
samples, where x
n
2 R
d
(1 n N) is a pixel difference
vector extracted from an original face image. Fig. 3 illus-
trates how to extract a PDV for a given face patch. Com-
pared with the original raw pixel patch, PDV measures the
difference between the central pixel and the neighboring
pixels within a patch, so that it can better describe how pixel
values change spatially and implicitly encode important
visual patterns such as edges and lines in face images.
Assume there are K hash functions to be learned in SLBFLE,
which map and quantize each x
n
into a binary vector
b
n
¼½b
n1
; ...; b
nK
2f1; 1g
K1
, so that the binary codes
Fig. 2. The basic idea of the LBP method, where a two-stage procedure
is used for local feature extraction: feature mapping and feature encod-
ing. For the feature mapping stage, the difference between the central
pixel and the neighboring pixels are computed and binarized with a fixed
threshold. For the feature encoding stage, the mapped binary codes are
encoded as a real value by using a hand-crafted pattern coding strategy.
Fig. 3. An illustration to show how to extract pixel difference vectors
(PDV) from the original face image. Given a face patch whose size is
ð2R þ 1Þð2R þ 1Þ, we first compute the difference between the central
pixel and the neighboring pixels. Then, these differences are considered
as a PDV. In this figure, R is selected as 2, so that there are 24 neighbor-
ing pixels selected and the PDV is a 24-dimensional feature vector.
LU ET AL.: SIMULTANEOUS LOCAL BINARY FEATURE LEARNING AND ENCODING FOR HOMOGENEOUS AND HETEROGENEOUS FACE... 1981