l
2,1
Norm regularized fisher criterion for optimal feature selection
$
Jian Zhang
a
, Jun Yu
b,
n
, Jian Wan
b
, Zhiqiang Zeng
c
a
School of Science and Technology, Zhejiang International Studies University, Hangzhou 310012, China
b
School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
c
College of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China
article info
Article history:
Received 3 November 2014
Received in revised form
6 February 2015
Accepted 19 March 2015
Communicated by Huaping Liu
Available online 9 April 2015
Keywords:
Feature selection
Fisher criterion
l
2;1
Norm
Sparsity
abstract
Feature selection has been proved to be an effective way to improve the result of many pattern
recognition tasks like image classification and automatic face recognition. Among all the methods, those
based on Fisher criterion have received considerable attention owing to their efficiency and good
generalization over classifiers. However, the original Fisher criterion-based methods ignore the inter-
dependencies between different features. To this end, this paper proposes an optimized feature selection
method which incorporates the l
2;1
norm regularization into the original Fisher criterion. The l
2;1
norm
regularization term assures the sparsity of the feature selection matrix, which makes the feature
selection result to be close to the globally optimized solution. Owing to the sparsity of the feature
selection matrix, a normalization constraint constructed based on the inter-class scatter matrix of Fisher
criterion is used to simplify the original problem, so that the solution of the feature selection problem
can be derived from an iterative algorithm whose key step is to solve a generalized eigenvalue problem.
Experiments on various data sets indicate that the proposed method provides higher accuracy in pattern
recognition tasks compared with several existing approaches.
& 2015 Elsevier B.V. All rights reserved.
1. Introduction
With the rapid development of public security business, pattern
recognition technique has found its usage in various applications
such as intelligent video surveillance and automatic entrance
control. In these applications, objects with high level semantics
[1] are often represented by some quantitative low level features
[2] for classification based on prior information. Usually, these
features have high data dimensionality and high data redundancy,
which bring out a series of passive influences to the classification
results. To this end, many approaches have been proposed to
reduce the data dimensionality and data redundancy in pattern
recognition, and feature selection [3] is one of the most important
means to solve the problem.
Feature selection methods can be divided into three categories,
unsupervised methods, supervised methods and semi-supervised
methods, according to whether data samples are available for
training. It is generally believed that the unsupervised methods
are inferior to the supervised and semi-supervised methods which
include filter methods [4,5], wrapper methods [6] and embedded
methods [7]. The filter methods check each feature's value of
objective function respectively, and select the feature with biggest
objective function value for pattern recognition. The wrapper
methods find feature groups according to certain searching strat-
egy, and evaluate the objective values of these feature groups
based on already trained classifiers to decide which group is more
suitable for pattern recognition. The embedded methods combine
the searching process into the classifier construction, and gain
higher computational efficiency than the wrapper methods.
Though wrapper methods and embedded methods often lead
to higher classification accuracy, filter methods are still widely
used because they are simple, computationally efficient, and have
good generalization over various classifiers. Among all the filter
methods, Fisher criterion attracts considerable attention owing to
the good performance of the objective function. Fisher criterion
evaluates the importance of individual feature to classification,
thus are not suitable for selecting a group of features simulta-
neously. Recent improvements of Fisher criterion remove this
limitation, but lack ability to achieve optimal feature group
selection. Another improvement of feature selection is the intro-
duction of sparsity constraint as regularization term of the loss
function. The concern behind this is that selecting a minority of
Contents lists available at ScienceDirect
journal homepage: www.e lsevier.com/locate/n eucom
Neurocomputing
http://dx.doi.org/10.1016/j.neucom.2015.03.033
0925-2312/& 2015 Elsevier B.V. All rights reserved.
☆
This paper is supported by the National Natural Science Foundation of China
(Nos. 61303143 and 61472110), the Program for New Century Excellent Talents in
University (NECT-12-0323), the Hong Kong Scholar Programme (XJ2013038),
Scientific Research Fund of Zhejiang Provincial Education Department (No.
Y201326609).
n
Corresponding author.
E-mail address: yujun@hdu.edu.cn (J. Yu).
Neurocomputing 166 (2015) 455–463