Feature selection via neighborhood multi-granulation fusion
Yaojin Lin
a,
⇑
, Jinjin Li
a,b
, Peirong Lin
a
, Guoping Lin
b
, Jinkun Chen
b
a
School of Computer Science, Minnan Normal University, Zhangzhou 363000, PR China
b
School of Mathematics and Statistics, Minnan Normal University, Zhangzhou 363000, PR China
article info
Article history:
Received 7 January 2014
Received in revised form 18 May 2014
Accepted 29 May 2014
Available online 10 June 2014
Keywords:
Granular computing
Feature selection
Multi-granulation
Neighborhood rough sets
Granularity influence
abstract
Feature selection is an important data preprocessing technique, and has been widely studied in data
mining, machine learning, and granular computing. However, very little research has considered a
multi-granulation perspective. In this paper, we present a new feature selection method that selects
distinguishing features by fusing neighborhood multi-granulation. We first use neighborhood rough sets
as an effective granular computing tool, and analyze the influence of the granularity of neighborhood
information. Then, we obtain all feature rank lists based on the significance of features in different
granularities. Finally, we obtain a new feature selection algorithm by fusing all individual feature rank
lists. Experimental results show that the proposed method can effectively select a discriminative feature
subset, and performs as well as or better than other popular feature selection algorithms in terms of
classification performance.
Ó 2014 Elsevier B.V. All rights reserved.
1. Introduction
Many data mining and pattern recognition systems suffer from
the curse of dimensionality. This motivates the search for suitable
feature selection methods [7,9,15,24,29,46]. In practice, many
application fields, such as bio-informatics and text categorization,
involve databases in which both the number of rows (objects)
and columns (features) increase rapidly. The high-dimensional
nature of the data presents a challenge to learning algorithms. This
is because not all features can contribute to the discriminative
power, and the correlated features may bring many disadvantages
to traditional learning algorithms, such as low efficiency, over-fit-
ting, and poor performance. To ease this problem, it is desirable
to reduce the high-dimensionality of data, as this enhances the
accuracy of pattern recognition and produces a more compact clas-
sification model with better generalization.
As we know, the feature selection technique plays a non-
trivial role in speeding up learning and improving classification
performance [7,9,11,15,46]. To date, a number of feature selec-
tion algorithms have been developed for classification learning.
The process of feature selection can be divided into two steps.
First, metrics such as mutual information [2,3,28], consistency
[4,12], dependency [9,10,48], and the classification margin [33]
are used to evaluate the quality of candidate features. Second, a
search strategy is designed to solve the given optimization func-
tion. Feature selection may employ a heuristic search, genetic
optimization and greedy search, or another intelligent search
algorithm [34].
Although many algorithms have been developed for feature
selection, very little work has considered a multi-granulation view.
Granular computing, proposed by Zadeh [44], is an approximation
schema that can effectively solve a complex program at a certain
level or at multiple levels of granulation, and has attracted increas-
ing interest [26,27,38,43,45]. There are many representative gran-
ular computing models, such as rough sets [25], fuzzy sets [27,44],
probabilistic rough sets [41,42], covering rough sets [49], and
neighborhood rough sets [8–10,18,39]. Of these, neighborhood
rough sets provide an effective granular computing model for the
problem of heterogeneous feature subset selection, and have been
widely applied in cancer recognition, image annotation, and vibra-
tion diagnosis. For multi-granulation rough sets, neighborhood
rough sets compute the neighborhoods of samples from which to
extract information granularity (in this paper, neighborhood size
and granularity are equivalent terms), and different information
granularities can be induced by different neighborhood sizes
[16,17].
It has been shown that diverse results can be obtained from
different granular spaces for a given learning task. Indeed, given
the same set of objects, different granular spaces can provide com-
plementary predictive powers, and the prediction accuracy is sig-
nificantly improved by combining their information [20–22,30].
The method of combining multiple granularities from different
http://dx.doi.org/10.1016/j.knosys.2014.05.019
0950-7051/Ó 2014 Elsevier B.V. All rights reserved.
⇑
Corresponding author. Tel.: +86 13960044089.
E-mail addresses: yjlin@mail.hfut.edu.cn (Y. Lin), jinjinli@mnnu.edu.cn (J. Li),
zzlprfj@163.com (P. Lin), gplin@163.com (G. Lin), cjk99@163.com (J. Chen).
Knowledge-Based Systems 67 (2014) 162–168
Contents lists available at ScienceDirect
Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys