Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression
Recognition in the Wild
Shan Li, Weihong Deng, and JunPing Du
Beijing University of Posts and Telecommunications
{ls1995, whDeng, junpingd}@bupt.edu.cn
Abstract
Past research on facial expressions have used relative-
ly limited datasets, which makes it unclear whether cur-
rent methods can be employed in real world. In this pa-
per, we present a novel database, RAF-DB, which contains
about 30000 facial images from thousands of individuals.
Each image has been individually labeled about 40 times,
then EM algorithm was used to filter out unreliable label-
s. Crowdsourcing reveals that real-world faces often ex-
press compound emotions, or even mixture ones. For all
we know, RAF-DB is the first database that contains com-
pound expressions in the wild. Our cross-database study
shows that the action units of basic emotions in RAF-DB are
much more diverse than, or even deviate from, those of lab-
controlled ones. To address this problem, we propose a new
DLP-CNN (Deep Locality-Preserving CNN) method, which
aims to enhance the discriminative power of deep features
by preserving the locality closeness while maximizing the
inter-class scatters. The benchmark experiments on the 7-
class basic expressions and 11-class compound expression-
s, as well as the additional experiments on SFEW and CK+
databases, show that the proposed DLP-CNN outperforms
the state-of-the-art handcrafted features and deep learning
based methods for the expression recognition in the wild.
1. Introduction
Millions of images are being uploaded every day by user-
s from different events and social gatherings. There is an
increasing interest in designing systems capable of under-
standing human manifestations of emotional attributes and
affective displays. To automatic learn the affective state of
face images from the Internet, large annotated databases are
required. However, the complexity of annotations of emo-
tion categories has hindered the collection of large annotat-
ed databases. On the other side, popular AU coding [12]
requires specific expertise to take months to learn and be
perfected, hence, alternative solutions are needed. And due
to the cultural difference in the way of perceiving facial e-
motion [13], it is difficult for psychologists to define definite
prototypical AUs for each facial expressions. Therefore, it
is also worth to study the emotion of social images from the
judgments of a large common population, besides from the
professional knowledge of a few experts.
In this paper, we propose to study the common ex-
pression perception by a reliable crowdsourcing approach.
Specifically, our well-trained annotators are asked to label
face images with one of the seven basic categories [11],
and each face is annotated enough times independently, i.e.
about 40 times in our experiment. Then, the noisy labels
are filtered by an EM based reliability evaluation algorithm,
through which each image can be represented reliably by a
7-dimensional emotion probability vector. By analyzing 1.2
million labels of 29672 great-diverse facial images down-
loaded from the Internet, these Real-world Affective Faces
(RAF)
1
are naturally categorized into two types: basic ex-
pression with single-modal distribution and compound e-
motions with bimodal distribution, an observation support-
ing a recent ground-breaking finding in the lab-controlled
condition [10]. To the best of our knowledge, the real-
world expression database RAF-DB is the first large-scale
database providing the labels of common expression per-
ception and compound emotions in unconstrained environ-
ment.
The cross-database experiment and AU analysis on
RAF-DB indicates that AUs of real-world expressions are
much more diverse than, or even deviate from, those of
lab-controlled ones guided by psychologists. To address
this ambiguity of unconstrained emotion, we further pro-
pose a novel Deep Locality-preserving CNN (DLP-CNN).
Inspired by [17], we develop a practical back-propagation
algorithm which creates a locality preserving loss (LP loss)
aiming to pull the locally neighboring faces of the same
class together. Jointly trained with the classical softmax
loss which forces different classes to stay apart, locality p-
reserving loss drives the intra-class local clusters of each
1
http://whdeng.cn/RAF/model1.html
1
2852