3
Fig. 2. Mismatched pairs. These are the first six mismatched pairs in the
database under View 1, as specified in the file pairsDevTrain.txt.
will be an important tool in studying the unconstrained pair
matching problem.
While some oth e r databases (such a s the Caltech 1 0000
Web Faces [1]) also present highly diverse image sets, these
databases ar e not designed for face recognition, but rather for
face detection. We now discuss the origin for Labeled Faces
in the Wild and a number of related da ta bases.
Faces in the Wild. The impetus for the L abeled Faces in
the Wild database grew out of work at Ber keley by Tamara
Berg, David Forsyth, and the c omputer vision group at UC
Berkeley [3], [4]. I n this work, it was shown that a large,
partially labeled, database of face images could be built by
using im perfect data gathered from the web. In particular, the
Berg database of faces was built b y jointly a nalyzing pictures
and their associated captions to clu ster image s by identity. The
resulting data set, which achieved a labelling accuracy of 77%
[3], was informally referred to as the “Faces in the Wild” data
set.
However, since the database was not originally intended to
act as training and test data for new experiments, it contained
a high percentage of label errors and a high percentage of
duplicated images. As a result, various researchers derived
ad hoc sub sets of the database for new research pro jects
[14], [15], [25], [27]. It seemed that there would be sufficient
interest in a clean version of the data set to warrant doing the
job thoroughly and publishing a new database.
Before addressing the details of LFW, we discuss some of
the databases most closely related to it. While these databa ses
share some features with LFW, we believe that L FW represen ts
an important contribution to existing databases, espe cially for
studying the problem of unconstrain e d face recog nition.
The Face Re cognition Grand Challenge Databases [28].
The Face Recognition Grand Challenge (FRGC) was not
just a set of databases, but a carefully planned scientific
program designed to pro mote rigorous scientific analy sis of
face recognition, fair com parison of face re cognition tech-
nologies, and advances in face r ecognition research [28]. It
represents the most compr e hensive an d scientifically rig orous
study of face recognition to date. We applau d the organizers
and implementers of the FRGC, and believe that the FRGC,
along with earlier vendor tests, have been important m otivators
and re ality checks for the face r e cognition community. The
FRGC was successful in stimulating researchers (in both the
private sector and academia) to achieve certain milestones in
face recognition.
The goals of our research, and hence of our database, are
somewhat different from the goals of the FRGC. One of the
key differences is that the organizers of the FRGC wished to
study the effect of new, richer data types on the face recogni-
tion problem. The databases for the FRGC thus includ e high
resolution data, three-dimensional scans, and image sequences
of each individual. (The databases contain more than 50,000
total recordings, including 3D scans and images.) Each of
these data types is potentially more informative than the simple
and moderate resolution images of our database. While one of
the major goa ls of the FRGC was to study how higher fidelity
data can help make face recognition mo re accurate, the goal of
Labeled Faces in the Wild is to help study the pr oblem of face
recogn ition using previously existing images, that is, images
that were not taken for the special purpose of face recognition
by machine. Thus, from the beginning we decided to build
our database from previously existing photographs that were
taken for other purposes.
Another important difference between the data sets associ-
ated with the FRGC and our data set is the general variety
of image s. For example, while there are large numbers of
images with uncontrolled lighting in the FRGC data sets, these
images con tain a great deal less natural variation than the
LFW images. For example, the FRGC outdoor uncontrolled