J. Cao et al. / Neural Networks 81 (2016) 91–102 93
Algorithm 1: ELM based classification.
Input : A training set {(x
j
, t
j
)}
N
j=1
, activation function g(x),
hidden node number L and a query image y.
Output: Class label of y
1 Randomly generate hidden node parameters
(w
i
, b
i
), i = 1, 2, . . . , L.
2 Calculate H(w
1
, . . . , w
L
, x
1
, . . . , x
N
, b
1
, . . . , b
L
).
3 Determine the output weight matrix
ˆ
β = H
Ď
T.
4 Calculate the actual output o of ELM with respect to y.
5 Label(y) = arg max
d∈{1,...,m}
(o).
where λ is the regularization parameter to make a tradeoff
between the training error term and regularization term. The
solution of (4) is given as follows Huang (2015):
• If L ≤ N
ˆ
β = (H
T
H + λI)
−1
H
T
T. (5)
• If L ≥ N
ˆ
β = H
T
(HH
T
+ λI)
−1
T (6)
where I is the identity matrix.
2.2. SRC
SRC algorithm first estimates the sparse representation coef-
ficients of a query sample through an over-complete dictionary
whose basis atoms are extracted features with known labels. Then,
the classification is performed by finding a minimum residual error
based on the sparse coefficient (Cao, Zhao, Lai, Chen et al., 2015; Liu,
Guo, & Sun, 2016; Liu, Liu, & Sun, 2015; Wright, Yang et al., 2009;
Yang et al., 2010, 2013). Particularly, SRC algorithm assumes that
the query image can be well approximated by a linear combination
of a few basis elements.
Assuming an image database {A
d
}
m
d=1
for m classes, A
d
∈ R
n×k
d
contains k
d
training images belonging to the dth class, with each
image feature being concatenated as a column vector. The coding
dictionary is constructed as A = [A
1
, A
2
, . . . , A
m
]. For a query
image y (a column vector containing features of an image), the
objective is to derive the optimal sparse representation coefficient
via the following optimization
ˆ
x = arg min
x
∥x∥
0
s.t. Ax = y (7)
where ∥·∥
0
is the ℓ
0
-norm, denoting the support of a vector. Owing
to the NP-hard problem of (7) and the progress of compressed
sensing (Donoho, 2006), researchers resort to solve the following
convex ℓ
1
-norm minimization problem
ˆ
x = arg min
x
∥x∥
1
s.t. ∥Ax − y∥
2
2
< ε (8)
where ε is a small error tolerance. Note that numerous algorithms
with different constraints on the sparse representation coefficient
have been developed for certain applications (Babacan, Molina,
& Katsaggelos, 2010; Cao & Lin, 2014, 2015b; Wright, Nowak,
& Figueiredo, 2009; Yang et al., 2010). In SRC, the characteristic
function δ
d
(
ˆ
x) maps to a new vector whose nonzero elements
are the entries in
ˆ
x associated to the dth class. Thus, for a query
sample y belonging to the dth category, it is believed that it can
be represented by the atoms from the dth class with the minimum
residual r
d
(y) = ∥y − Aδ
d
(
ˆ
x)∥
2
2
. In other words, the classification
can be achieved by finding the minimum residual as
Label(y) = arg min
d∈{1,2,...,m}
r
d
(y) with r
d
(y) = ∥y − Aδ
d
(
ˆ
x)∥
2
2
. (9)
Algorithm 2: SRC based classification.
Input : A dictionary with m classes A = [A
1
, A
2
, . . . , A
m
], a
query image y
Output: Class label of y
1 Normalize the columns of A to have unit l
2
-norm.
2 Solve the optimization problem (8).
3 Calculate the residuals
r
d
(y) = ∥y − A
d
δ
d
(
ˆ
x)∥
2
2
, d = 1, 2, . . . , m.
4 Label(y) = arg min
d∈{1,2,...,m}
r
d
(y).
Algorithm 2 summarizes the SRC based classification.
SRC has shown various promising features in face recognition
and image processing (Bai et al., 2015; Cao, Zhao, Lai, Chen et al.,
2015; Cao, Zhao, Lai, Ong et al., 2015; Luo & Zhang, 2014; Wang
et al., 2014; Wright, Yang et al., 2009; Zhu et al., 2016). First of all, in
contrast with most of the existing classifiers whose performances
highly depend on feature extraction, it is pointed out in Wright,
Yang et al. (2009) that SRC with accurate sparse coefficient opti-
mization and a complete dictionary with enough features is not
sensitive to feature extraction. The second important merit of SRC
is that it is training-free. Despite this, the SRC algorithm suffers the
long testing time issue, which is incurred by the ℓ
1
-norm mini-
mization problem (Cao, Zhao, Lai, Chen et al., 2015; Cao, Zhao, Lai,
Ong et al., 2015; Luo & Zhang, 2014). Experiments in Cao, Zhao,
Lai, Chen et al. (2015), Cao, Zhao, Lai, Ong et al. (2015) and Luo
and Zhang (2014) have shown that the sparse representation coef-
ficient estimation for some algorithms takes a few seconds or even
dozens of seconds per sample. Such a long testing time has severely
limited the applicable ranges of SRC, especially in the current big
data era.
To reduce the overall computation burden, the cascade classifier
ELM-SRC has been developed for image classification (Cao, Zhao,
Lai, Ong et al., 2015; Luo & Zhang, 2014). Incorporating with the
fast training and testing property of ELM and the good image
classification capability of SRC, the hybrid ELM-SRC classifier
speeds up the overall classification procedure through partitioning
the testing image dataset into two sub-sets and adaptively
choosing ELM and SRC to perform classification. Although ELM-SRC
has an advantage over ELM in classification accuracy and shows
superiority over SRC in testing speed, its classification accuracy is
generally not as good as SRC. In addition, ELM-SRC still has room for
improvement due to using the over-complete or highly redundant
dictionary for linear representation.
3. The proposed EA-SRC
To further reduce the misclassification rate as well as the com-
putational complexity of the hybrid ELM-SRC classifier, we de-
velop a novel image classification algorithm based on the regular-
ized ELM with an optimal regularization parameter and SRC with
an adaptive sub-dictionary selection strategy. The contributions
of this paper are two-fold: (1) the LOO cross validation approach
is adopted to optimize the regularization parameter of ELM and
thus the best classification hyperplane is achieved; (2) for each
query image, an adaptive sub-dictionary selection strategy using
the output vector derived by the optimized ELM is developed for
sparse representation dictionary construction. Details of the pro-
posed classifier are given in the following.
3.1. LOO error based optimization
The leave-one-out cross validation (LOO) approach is one of
the most effective methods for model selection and parameter