3924 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 9, SEPTEMBER 2014
to the non-linear objective function. Therefore a three-stage
architecture is necessary to solve a large learning problem [17].
Secondly, the squared sum of regression errors defined in
the objective function makes the feature selection sensitive to
outliers. Thirdly, the class label g can only take the value either
+1or−1, therefore the model could not generate a maximal
margin. Margin analysis is important to the generalization
ability of machine learning algorithms and the most powerful
machine learning methods, e.g. Support Vector Machine and
Boosting are motivated by margins. In addition, the features of
training samples should be normalized to match the class label,
therefore additional computational cost is needed. Fourthly,
the model of Lasso is not flexible so that the optimization
does not take into account the characteristics of image features
and biometric recognition. For example, L1 regularization term
|
f
|
1
in the optimization function Eqn. 1 assigns an identical
weight to all features and the discriminative information of
each feature is not taken into consideration.
The L1 regularization is a popular technique for feature
selection. For example, Guo et al. proposed a linear program-
ming formulation of feature selection with application to facial
expression recognition [13]. The objective function aims to
minimize misclassification error and the L1 norm of feature
weight.
In summary, both Boosting [14], [16] and Lasso [15], [17]
have limitations in ordinal feature selection and it is desirable
to develop a feature selection method with the following
properties.
1) The feature selection process can be formulated as a
simple optimization problem. Here “simple” means that
both the objective function and the constraint terms
can be defined following a well-established standard
optimization problem. So that it is easy to obtain a global
solution of the feature selection problem.
2) A sparse solution can be achieved in feature selection
so that the selected feature set is compact for efficient
storage, transmission and feature matching.
3) The penalty of misclassification can not be a high-order
function of regression errors to control the influence of
outliers.
4) The model of feature selection should be flexible to
take into account the characteristics of the biometric
recognition problem so that the genuine and imposter
matching results can be well separated from each other
and the selected image features are accurate in training
database.
5) The feature selection problem has less dependence on
the training data and it can be solved by a small set of
training samples. It requires the feature selection method
can circumvent the curse of dimensionality problem and
generalize to practical applications.
This paper proposes a novel feature selection method which
meets all requirements listed above. In our method, the fea-
ture selection process of ordinal measures is formulated as
a constrained optimization problem. Both the optimization
objective and the constraints are modeled as linear functions;
therefore linear programming (LP) can be used to efficiently
Fig. 2. Problem statement of ordinal feature selection in iris recognition.
solve the feature selection problem. The feature units used
for LP formulation are regional ordinal measures, which are
tested on the training dataset to generate both intra- and inter-
class matching samples. Our feature selection method aims at
finding a compact subset of ordinal features that minimizes
biometric recognition errors with large margin between intra-
and inter-class matching samples. The objective function of
our LP formulation includes two parts. The first part measures
the misclassification errors of training samples failing to
follow a large margin principle. And the second part indicates
weighted sparsity of ordinal feature units. Traditional sparse
representation uses L1-norm to achieve sparsity of feature
selection and all feature components have an identical unit
weight in sparse learning. However, we argue that it is better to
incorporate some prior information related to candidate feature
units into sparse representation so that the most individually
accurate ordinal measures are given preferential treatment.
And the linear inequality constraints of LP optimization prob-
lem require that all intra- and inter-class matching results are
well separated from each other with a large margin. Slack
variables are introduced to ensure the inequality constraints of
ambiguous and outlier samples.
III. F
EATURE SELECTION BASED ON
LINEAR PROGRAMMING
The objective of feature selection for biometric recognition
is to select a limited number of feature units from the candidate
feature set (Fig. 2). In this paper, a feature unit is defined as
the regional ordinal encoding result using a specific ordinal
filter on a specific biometric region. We aim to use a machine
learning technique to find the weights of all ordinal feature
units. So that feature selection can also be regarded as a
sparse representation method, i.e. most weight values are zero
and only a compact set of feature units have the weighted
contribution to biometric recognition.
In this paper, ordinal feature selection is formulated as a
constrained optimization problem as follows.