Pereira et al. EURASIP Journal on Image and Video Processing 2014, 2014:2 Page 3 of 15
http://jivp.eurasipjournals.com/content/2014/1/2
Because of differences in reflectance properties, real
faces very likely present different texture patterns com-
pared with fake faces. Following that hyp othesis, Määttä
et al. [17] and Chingovska et al. [3] explored the power
of local binary p atterns (L BP) a s a countermeasure.
Määttä et al. combined three different LBP configura-
tions (LBP
u2
8,2
,LBP
u2
16,2
and LBP
u2
8,1
)inanormalizedface
image and trained a support vector machine (SVM) clas-
sifier to discriminate real and fake faces. Evaluations car-
ried out with NUAA Photograph Impostor Database [5]
showed a go od discrimination power (2.9% in EE R). Chin-
govska et al. analysed the effectiveness of LBP
u2
8,1
and set
of extended LBPs [25] in still images to discriminate re al
and fake faces. Evaluations carried out with three differ-
ent databases , the NUAA Photograph Impo stor Database,
Replay-Attack database and CASIA Face Anti-Spoofing
Database [6], showed a good discrimination power with a
HTER equal to 15.16%, 19.03% and 18.17%, respectively.
3 LBP-based dynamic texture description
Määttä et al. [17] and Chingovska et al. [3] propose a
LBP-based countermea sures to sp oofing attacks based
on the hypothesis that real faces present different tex-
ture patterns in comparison with fake ones. However, the
proposed techniques analyse each frame in isolation, not
considering the behaviour over time. As pointed out in
Section 2, motion is a cue explored in some works and in
combination with texture can generate a powerful coun-
termeasure. For de scribing the face liveness for spoofing
detection, we considered a s patiotemporal representa-
tion which combines facial appearance and dynamics.
We adopted the LBP-based spatiotemporal representation
because of its recent convincing performance in mod-
elling moving faces and facial expression recognition and
also for dynamic texture recognition [20].
The LBP texture analys is operator, introduced by Ojala
et al. [26,27], is d efined as a gray- scale invariant texture
measure, derived from a general definition of texture in a
local neighbourhood. It is a powerful texture descriptor,
and among its properties in real-world applications are its
discriminative power, computational simplicity and toler-
ance against monotonic gray-scale changes. The original
LBP operator forms labels for the image pi xels by thre sh-
olding the 3 × 3 neighbourhood with the center value and
considering the result as a binary number. The histogram
of these 2
8
= 256 different labels is then used as an image
descriptor.
The original LBP operator was defined to only deal
with the spatial information. However, more recently, it
has been extended to a spatiotemporal representation for
dynamic texture (DT) analysis. This has yielded to the so-
called volume local binary pattern operator (VLBP) [21].
The idea behind V LBP consists of looking at dynamic tex-
ture (video sequence) as a set of volumes in the (X, Y , T)
space where X and Y denote the spatial coordinates and T
denotes the frame index (time). The neighborhood of each
pixel is thus defined in a three-dimensional space. Then,
similar to basic LBP in spatial domain, volume textons
can be defined and extracted into histograms. Therefore,
VLBP combines motion and appearance into a dynamic
texture description.
To make VLBP computationally treatable and easy to
extend, the co-occurrences of the LBP on the three
orthogonal planes (LBP-TOP) was also introduced [21].
LBP-TOP consists of the three orthogonal planes - XY , XT
and YT - and the concatenation of local binary pattern
co-occurrence statistics in these three directions. The
circular neighbourhoods are generalized to elliptical sam-
pling to fit to the space-time statistics. The LBP codes
are extracted from the XY , XT and YT planes, which are
denoted as XY -LBP, XT- LBP and YT -LBP,forallpixels,
and statistics of the three different planes are obtained
and concatenated into a single histogram. The procedure
is show n in Figure 1. In this representation, DT is encoded
by the XY-LBP, XT-LBP and YT-LBP.
Using equal radii for the time and spatial axes is not a
good choice for dynamic textures [21], and therefore, in
the XT and YT planes, different radii can be assigned to
sample neighbouring points in space and time. More gen-
erally, the radii R
x
, R
x
and R
t
, respectively, in axes X, Y and
T and the number of neighbouring points P
XY
, P
XT
and
P
YT
, respectively, in the XY, XT and YT planes can also
be different. Furthermore, the type of LBP operator on
each plane can vary; for example, the unifor m pattern (u2)
or rotation invariant uniform pattern (riu2) variants [20]
can be deployed. The corresp onding feature is denoted as
LBP-TOP
operator
P
XY
,P
XT
,P
YT
,R
x
,R
y
,R
t
.
Assuming we are g iven a X × Y × T dynamic tex-
ture (x
c
∈
{
0, ··· , X − 1
}
, y
c
∈
{
0, ··· , Y − 1
}
, t
c
∈
{
0, ··· , T − 1
}
), i.e. a video sequence. A histogram of the
DT can be defined as
H
i,j
=
x,y,t
I
f
j
(x, y, t) = i
, i = 0, ··· , n
j
− 1; j = 0, 1, 2
(1)
where n
j
is the number of different labels produced by the
LBP operator in the jth plane ( j = 0:XY ,1:XT and 2 :
YT ), and f
i
(x, y, t) expresses the LBP code of the central
pixel (x, y, t) in the jth plane.
Similar to the original LBP, the histograms must be nor-
malized to get a coherent descr iption for comparing the
DTs:
N
i,j
=
H
i,j
n
j
−1
k=0
H
k,j
.(2)