
Nesterov Accelerated Gradient Descent-based Convolution Neural
Network with Dropout for Facial Expression Recognition
Wanjuan Su, Luefeng Chen, Min Wu, Mengtian Zhou, Zhentao Liu and Weihua Cao
Abstract— Nesterov accelerated gradient descent-based con-
volution neural network (NAGDCNN) with dropout is proposed
for facial expression recognition, which fuses the convolution
neural network (CNN) with Softmax regression to construct
a deep convolution neural network (DCNN) that can excavate
high-level expression features and classify them. The dropout
layer is added after the sub-sampling layer which can effectively
reduce overfitting and the network’s training time, moreover,
the Nesterov accelerated gradient descent (NAGD) is used to
optimize the network weights that can predictably prevent
the algorithm from going too fast or too slow and enhance
the response capability of the network. To verify the effec-
tiveness of the proposal, experiments on benchmark database
are conducted, and the experimental results show that the
proposal outperforms the state-of-the-art methods. Futhermore,
the application experiment is also curried out and the results
indicate the feasibility of the proposal in practical applications.
Key Words—Deep learning, facial expression recognition,
Nesterov accelerated gradient descent, dropout, principal com-
ponent analysis
I. INTRODUCTION
With the development of various kinds of technologies, the
level of social intelligence is also increasing, and people’s
requirements for human-robot interaction (HRI) experience
are getting higher and higher. However, the existing machines
are unable to interact with people emotionally [1]. Facial ex-
pression is one of the main channels for human to express e-
motion [2], so achieving facial expression recognition (FER)
is conducive to the realization of the machine’s recognition
for human emotions or even understanding it. FER has a wide
range of applications [3], such as fatigue driving test, remote
nursing, HRI, etc. Therefore, the realization of more accurate
FER can promote the development of social intelligence.
FER can be divided into expression feature extraction
and expression feature recognition [4]. As for expression
feature extraction, various methods are employed in the
previous papers, e.g., active appearance models [5], scale-
invariant feature transform [6], local binary pattern [7],
Gabor wavelet transform [8] and so on. Especially, prin-
cipal component analysis (PCA) [9] is also a commonly
used feature extraction algorithm which can simplify data
structure by reducing data dimensionality. In order to learn
This work was supported by the National Natural Science Foundation
of China under Grants 61733016, 61603356 and 61210011, the Hubei
Provincial Natural Science Foundation of China under Grant 2015CFA010,
and the 111 project under Grant B17040.
W. J. Su, L. F. Chen, M. Wu, M. T. Zhou, Z. T. Liu, and W. H. Cao are
with the School of Automation, China University of Geosciences, Wuhan
430074, China, and also with the Hubei Key Laboratory of Advanced
Control and Intelligent Automation for Complex Systems, Wuhan 430074,
China. (Corresponding author: chenluefeng@cug.edu.cn)
projection subspaces equipped with the ability of robustness
and generalization, a new subspace learning algorithms based
on the standard PCA, linear discriminant analysis, clustering
based discriminant analysis (CDA) and their combinations
is proposed [10], the combination of PCA and CDA has
achieved better performance on facial expression database.
Here, PCA is also chosen to extract expression feature to
solve the problems of data redundancy and high dimension.
The facial expression feature recognition aims to design
a suitable classification mechanism to recognize facial ex-
pression, of which common algorithms have hidden Markov
model [11], support vector machines (SVM) [12], etc.
A framework for FER by using appearance features of
salient facial patches (SFP) is proposed [13], which inves-
tigates the relevance of different facial patches, and exper-
iments on benchmark databases show the effectiveness of
SFP. Nevertheless, its process of obtaining the high-level
expression feature is very complicated, deep learning (DL)
has a strong ability of unsupervised feature learning which
has brought about changes and leaps in various fields [14].
The DL aims at discovering the input data’s high-levels
of distributed representations which has been widely used
in speech recognition, image recognition and other fields.
Hinton [15] et al. used deep belief network (DBN) and deep
automatic encoders to perform simple image recognition and
dimensionality reduction task which proved the feasibility
of the application of deep neural network (DNN) in image
recognition. Based on this, many researchers begin to apply
DL to FER. For instance, Liu et al. [16] proposed to adapt
3D convolutional neural networks (3DCNN) with deformable
action parts (DAP) constraints, namely, a deformable parts
learning component is incorporated into 3DCNN which
can detect specific facial action parts under the structured
spatial constraints, and obtain the discriminative part-based
representation simultaneously. The deep convolution neural
network (DCNN) is applied to perform feature learning and
smile detection simultaneously [17], by using the learned
features to train the SVM or AdaBoost classifier, which
shows that the learned features have impressive discrimina-
tive ability. It can be seen that DL can effectively combine
feature learning and classification into a single model. In
CNN, convolution layers and sub-sampling layers are usually
stacked iteratively to extract high-level semantic features.
However, here we develop a DCNN for FER that fuses the
CNN with Softmax regression (SR) to construct a DCNN.
Besides, the dropout layer is employed in the DCNN pro-
cedure which can effectively alleviated overfitting problem
[18] and reduce the network’s training time to some extent.
2017 11th Asian Control Conference (ASCC)
Gold Coast Convention Centre, Australia
December 17-20, 2017
978-1-5090-1573-3/17/$31.00 ©2017 IEEE 1063