Journal of Beijing Institute of Technology, 2017, Vol. 26, No. 4
Feature Selection with Fluid Mechanics Inspired Particle
Swarm Optimization for Microarray Data
Shengsheng Wang
1
and Ruyi Dong
1,2,苣
(1. College of Computer Science and Technology, Jilin University, Changchun 130012, China;
2. Jilin Vocational College of Industry and Technology, Jilin, Jilin 132013, China)
Abstract: Deoxyribonucleic acid (DNA) microarray gene expression data has been widely utilized in
the field of functional genomics, since it is helpful to study cancer, cells, tissues, organisms etc.
But the sample sizes are relatively small compared to the number of genes, so feature selection is
very necessary to reduce complexity and increase the classification accuracy of samples. In this pa鄄
per, a completely new improvement over particle swarm optimization (PSO) based on fluid mechan鄄
ics is proposed for the feature selection. This new improvement simulates the spontaneous process
of the air from high pressure to low pressure, therefore it allows for a search through all possible so鄄
lution spaces and prevents particles from getting trapped in a local optimum. The experiment shows
that, this new improved algorithm had an elaborate feature simplification which achieved a very pre鄄
cise and significant accuracy in the classification of 8 among the 11 datasets, and it is much better in
comparison with other methods for feature selection.
Key words: feature selection; particle swarm optimization (PSO); fluid mechanics (FM); microar鄄
ray data; support vector machine (SVM)
CLC number: TP 391郾 4摇 摇 Document code: A摇 摇 Article ID: 1004鄄 0579(2017)04鄄 0517鄄 08
Received摇 2016鄄10鄄20
Supported by the National Natural Science Foundation of Chi鄄
na (61472161,61402195, 61502198)
苣 Author for correspondence, lecturer
E鄄mail: dongruyi@ 163. com
DOI: 10. 15918 / j. jbit1004鄄0579. 201726. 0411
摇 摇 The Deoxyribonucleic acid ( DNA) microar鄄
ray data is known to have latent qualities as they
denote the state of cells in molecular levels. Sam鄄
ple sizes are usually small and often less than 100
pieces, but the number of genes is much more
than that, which is ranging from 6 000 to
60 000
[1]
. So it is a big challenge for machine
learning researchers, that is, there are so few
samples that it is almost impossible to find “false
positives冶
[2]
. Meanwhile, the research shows
that only a small number of genes show a strong
correlation with a certain phenotype compared to
the total number of genes investigated. There鄄
fore, feature selection is a very necessary ap鄄
proach to reduce complexity and increase the ac鄄
curacy of pattern recognition in this kind of re鄄
search work
[3]
.
There are two main kinds of feature selection
algorithms: filter algorithms and wrapper algo鄄
rithms. Filter algorithms filter the subset of fea鄄
tures and then implement classification algorithms
as inputs, while wrapper algorithms wrap optimi鄄
zation algorithms and classification algorithms to鄄
gether for feature selection
[4]
. Recently, various
techniques of gene selection of filter and wrapper
methods have been introduced in machine learn鄄
ing. A novel technique, which was derived from
evolutionary algorithm (EA) to combine optimal
classifiers and enhance feature selection, was
proposed by Kim et al
[5]
. Martineza proposed a
swarm intelligence feature selection algorithm of
cuPSO based on the initialization and update of
only a subset of particles in the swarm
[6]
. Chuang
et al. proposed a hybrid method of binary particle
swarm optimization (BPSO) and a combat genet鄄
—715—