4321
Abstract
Image-set-based face recognition has recently attracted
much attention due to widespread of surveillance and video
retrieval applications. Extraction of partial and misaligned
face images from a video is relatively common in
unconstrained scenarios and in the presence of
detection/localization error, respectively. However,
existing face recognition techniques that consider holistic
image-set representation would not perform well under
such conditions. In this paper, we introduce a local
image-set-based face recognition approach to address this
issue, where each image set is represented by a cluster set
of keypoint descriptors and similarity between image sets is
measured by the distance between the corresponding sets of
clusters. Our representation is robust to misalignment
because the extraction of descriptors is carried out without
respect to the absolute face position. Additionally, our
approach is robust to partial face occlusion due to that (1)
descriptors corresponding to non-occluded keypoints are
not affected by the occluded keypoints; (2) matching
decision is contributed only by distances between the
matched cluster pairs corresponding to the non-occluded
facial parts. Extensive experiment evaluation shows that
our approach is able to achieve very promising recognition
rates.
1. Introduction
With the recent widespread of surveillance and video
retrieval applications, image set-based face recognition has
attracted enormous research interest throughout the last
decade [1][2][3][5][6][8][17][18]. Since face images in
image set-based face recognition are collected from video
sequences, both training and test examples are comprised
of sets of an individual’s face images and the final
recognition decision is made based on comparisons of such
image sets.
In practice, face images captured from a surveillance
video for example are often obtained without user
cooperation and knowledge. Frequently, the face of an
individual captured in the video could be partially occluded
[11]. Furthermore, since faces are usually extracted from a
video frame sequence by using a face detector/tracker,
faces in the video frames are rarely perfectly aligned over
the set of extracted images due to potential detection or
localization error of the imperfect face detector/tracker.
When holistic face representation is adopted in
image-set-based face recognition [1][2][5][6][8][17][18],
simultaneous occurrence of face occlusion and
misalignment deteriorates face recognition performance.
A straightforward way to tackle these challenges is to
apply an existing single-probe-image-based face
recognition method that is occlusion- and
misalignment-robust in the image-set-based setting.
Recently, a local face recognition approach, namely the
multi-keypoint descriptor (MKD)-based approach [11] has
been proposed to address the occlusion and misalignment
problems in the single probe image-based setting. This
approach (1) extracts a number of salient facial keypoints
and a descriptor per keypoint from a face image without
requiring the face to be pre-aligned with that in the other
face images; and (2) performs recognition via applying
Sparse Representation-based Classification (SRC) [19] on
a large dictionary of keypoint descriptors [11]. It is worth to
note, however, that adopting it straightforwardly in the
image set-based setting is not appropriate due to suboptimal
discrimination power and unacceptably-low efficiency.
To avoid these drawbacks, it is the objective of this
paper to develop a simple and effective local approach for
image-set-based face recognition under uncontrolled
conditions. We first detect keypoints in each image, extract
an alignment-free descriptor per keypoint and pool these
descriptors over a set of images in a common feature space.
To derive a robust representation from consistent keypoints
in the spatial domain (which leads to dense descriptors in
the feature domain), we adopt a density-based clustering
approach to select dense descriptors corresponding to
consistent keypoints and to group descriptors according to
their facial parts into a number of clusters. With this cluster
representation of an image set, we devise a series of
occlusion-robust matching procedures to evaluate the
Image Set-based Face Recognition:
A Local Multi-Keypoint Descriptor-based Approach
Na Liu
1
, Meng-Hui Lim
2
, Pong C. Yuen
2
, and Jian-Huang Lai
1
School of Maths and Computational Science
Sun Yat-sen University
Guangzhou, China
lindaliumail@gmail.com, stsljh@mail.sysu.edu.cn
Department of Computer Science
Hong Kong Baptist University
Kowloon, Hong Kong
{mhlim, pcyuen}@comp.hkbu.edu.hk