930 Cai et al. / Front Inform Technol Electron Eng 2015 16(11):930-939
Frontiers of Information Technology & Electronic Engineering
www.zju.edu.cn/jzus; engineering.cae.cn; www.springerlink.com
ISSN 2095-9184 (print); ISSN 2095-9230 (online)
E-mail: jzus@zju.edu.cn
Multiclass classification based on a deep convolutional
network for head pose estimation
∗
Ying CAI
1,2
, Meng-long YANG
‡3
,JunLI
2
(
1
School of Computer Science, Sichuan University, Chengdu 610065, China)
(
2
College of Information Engine ering, Sichuan A gricultural University, Yaan 625014, China)
(
3
School of A eronautics and Astronautics, Sichuan University, Chengdu 610065, China)
E-mail: caiying34@qq.com; steinbeck@163.com; ljun402@163.com
Received Apr. 20, 2015; Revision accepted May 15, 2015; Crosschecked Oct. 16, 2015
Abstract: Head pose estimation has been considered an important and challenging task in computer vision. In
this paper we propose a novel method to estimate head pose based on a deep convolutional neural network (DCNN)
for 2D face images. We design an effective and simple method to roughly crop the face from the input image,
maintaining the individual-relative facial features ratio. The method can be used in various poses. Then two
convolutional neural networks are set up to train the head pose classifier and then compared with each other. The
simpler one has six layers. It performs well on seven yaw poses but is somewhat unsatisfactory when mixed in two
pitch poses. The other has eight layers and more pixels in input layers. It has better performance on more poses
and more training samples. Before training the network, two reasonable strategies including shift and zoom are
executed to prepare training samples. Finally, feature extraction filters are optimized together with the weight of
the classification component through training, to minimize the classification error. Our method has been evaluated
on the CAS-PEAL-R1, CMU PIE, and CUBIC FacePix databases. It has better performance than state-of-the-art
methods for head pose estimation.
Key words: Head pose estimation, Deep convolutional neural network, Multiclass classification
doi:10.1631/FITEE.1500125 Document code: A CLC number: TP391
1 Intro duction
The problem of head pose estimation has en-
joyed substantial attention in the computer vision
community. Robust algorithms of head pose esti-
mation could be beneficial for many applications,
such as video surveillance, human computer inter-
action, video conferencing, and face recognition.
However, it is still an intrinsically challenging task
‡
Corresponding author
*
Project supported by the National Key Scientific In-
strument and Equipment Development Project of China
(No. 2013YQ49087903), the National Natural Science Founda-
tion of China (No. 61402307), and the Educational Commission
of Sichuan Province, China (No. 15ZA0007)
ORCID: Ying CAI, http://orcid.org/0000-0002-5096-6175
c
Zhejiang University and Springer-Verlag Berlin Heidelberg 2015
because of the appearance variation between identi-
ties, complex illumination, varied background, and
other factors. Many methods use classification or
regression to solve the problem of pose estimation.
In this paper, we treat the problem of head pose
estimation as a classification question, because we
believe that there are invariant essential features in
the images with the same pose and these features are
suitable for pose classification. Furthermore, we find
that the deep convolutional neural network (DCNN)
performs well on many visual tasks, because spatial
topology and shift-invariant local features are well
captured (LeCun et al., 1998). We consider that ap-
propriate DCNN architecture and an effective image
preprocess will produce good performance on head