1478 IEEE SIGNAL PROCESSING LETTERS, VOL. 24, NO. 10, OCTOBER 2017
Two-Stream Deep Correlation Network for
Frontal Face Recovery
Ting Zhang, Qiulei Dong, Ming Tang, and Zhanyi Hu
Abstract—Pose and textural variations are two dominant factors
to affect the performance of face recognition. It is widely believed
that generating the corresponding frontal face from a face image of
an arbitrary pose is an effective step toward improving the recog-
nition performance. In the literature, however, the frontal face
is generally recovered by only exploring textural characteristic.
In this letter, we propose a two-stream deep correlation network,
which incorporates both geometric and textural features for frontal
face recovery. Given a face image under an arbitrary pose as in-
put, geometric and textural characteristics are first extracted from
two separate streams. The extracted characteristics are then fused
through the proposed multiplicative patch correlation layer. These
two steps are integrated into one network for end-to-end train-
ing and prediction, which is demonstrated effective compared with
state-of-the-art methods on the benchmark datasets.
Index Terms—Correlation layer, deep neural network, frontal
face recovery, geometric stream, textural stream.
I. INTRODUCTION
F
ACE recognition is a field of great potential, which has been
widely used in access control, video surveillance, personal
verification, etc. Over the past decade, there have been tremen-
dous advances in face recognition, most of which are owed to
the development of deep learning [1]–[5]. Although data-driven
features extracted by deep neural networks show great advan-
tages over the hand crafted ones in face recognition [6]–[10],
the performance of face recognition is usually influenced by the
large variations in pose, illumination, expression, etc. Among
them, pose variation has been a persistent challenge because it
Manuscript received May 14, 2017; revised July 10, 2017; accepted July
20, 2017. Date of publication August 7, 2017; date of current version August
29, 2017. This work was supported in part by the Strategic Priority Research
Program of the Chinese Academy of Sciences under Grant XDB02070002,
and in part by the National Natural Science Foundation of China under Grant
61421004, Grant 61375042, and Grant 61573359. The associate editor coor-
dinating the review of this manuscript and approving it for publication was
Dr. Sumohana S. Channappayya. (Corresponding author: Qiulei Dong.)
T. Zhang is with the National Laboratory of Pattern Recognition, Institute
of Automation, Chinese Academy of Sciences, Beijing 100190, China and also
with the University of Chinese Academy of Sciences, Beijing 100049, China
(e-mail: ting.zhang@nlpr.ia.ac.cn).
Q. Dong and Z. Hu are with the National Laboratory of Pattern Recognition,
Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China,
and with the University of Chinese Academy of Sciences, B eijing 100049,
China, and also with the Center for Excellence in Brain Science and Intelligence
Technology, Chinese Academy of Sciences, Beijing 100190, China (e-mail:
qldong@nlpr.ia.ac.cn; huzy@nlpr.ia.ac.cn).
M. Tang is with the National Laboratory of Pattern Recognition, Institute
of Automation, Chinese Academy of Sciences, Beijing 100190, China (e-mail:
tangm@nlpr.ia.ac.cn).
Color versions of one or more of the figures in this letter are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/LSP.2017.2736542
may make the intraperson variance exceed the interperson one.
In view of this, many methods have been proposed to transfer
a face image of arbitrary pose to the frontal one. These meth-
ods can be roughly classified into two groups: Two-dimensional
(2-D)-based methods [11]–[17] and 3-D-based ones [18]–[21].
The 2-D-based techniques usually encode a test image with
some exemplars, or use 2-D image matching algorithms to ad-
dress the pose variation. In [12], Markov random fields was ap-
plied to infer the frontal face images. Li et al. [15] proposed an
elastic matching method which aligned the patches and matched
the face images of different poses based on Gaussian Mixture
Model. In [1], a deep convolutional neural network was pro-
posed to recover the frontal image of neutral illumination from
those with arbitrary poses and illumination. In [11], a new deep
architecture was presented to generate face images with target-
poses from those with arbitrary poses and illumination. In [17],
recurrent neural networks were combined with autoencoders to
render sequences of rotated face images through incremental
3-D rotations.
The 3-D-based techniques attempt to match the captured 3-D
facial data to probe face images or align a probe face image to
a 3-D face model. Asthana et al. [19] constructed an aligned
3-D face model from a nonfrontal face image, and then rotated
the model to render a frontal face image. In [20], a virtual
view for the probe image was generated based on a set of 3-D
displacement fields sampled from a 3-D face database and the
synthesized faces were tested.
Despite the demonstrated success, the performance of ex-
isting methods on frontal face recovery is still limited. The
methods based on 3-D reconstruction are time consuming and
sometimes require several views captured at multiple poses.
Although being efficient and only requiring a single input im-
age, the performance of 2-D reconstruction methods is limited
because they only exploit the facial textures to align face images.
These textures are not effective enough to locate correspondence
when face is under out-of-plane rotation.
In this letter, we propose a two-stream deep correlation net-
work (TSDCN) to solve the aforementioned limitations. Given
an input face image, we extract the textural and geometric fea-
tures independently via two streams. The textural stream per-
forms similarly with existing methods and the geometric stream
predicts the angles of the face poses. The angle predictions are
then correlated with the texture correspondence to predict the
recovered face image. Experimental results on the Multi-PIE
and labeled faces in the wild (LFW) datasets demonstrate the
validity of the proposed method.
The contributions of this work include the following.
1) We propose a two-stream network to tackle the frontal
face recovery problem, which could independently cap-
ture textural and geometric features of input face image.
1070-9908 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications
standards/publications/rights/index.html for more information.