
An effective fuzzy kernel clustering analysis
approach for gene expression data
Lin Sun
a,b, ∗
, Jiucheng Xu
a,b
and Jiaojiao Yin
a
a
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
b
Engineering Technology Research Center for Computing Intelligence and Data Mining, Henan
Province, China
Abstract. Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering
method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a
new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results
for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian
kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min
distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of
improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing
experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved
FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed
algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.
Keywords: Spectral analysis, maximum distance, fuzzy clustering, gene expression data
1. Introduction
In computational biology, clustering is a useful technique for gene expression data as it groups similar
objects together and allows biologist to identify potential relationships between genes [1]. Unsupervised
clustering methods have been applied to gene expression data analysis, and the unsupervised ensemble
approaches improve accuracy and reliability of clustering results [2]. However, traditional clustering ap-
proaches are inadequately flexible when a gene experiences differential coregulation in different samples
of the same data set as a result of being involved in differing functional relationships [3].
In recent years, the application of kernels in fuzzy c-means (FCM), fuzzy k-means, and evolution
algorithms is effective in terms of improving clustering performance. However, FCM has drawbacks
such as the result of clustering process deteriorates while noise and outliers exist in data set, blindness
of random prototype initialization leads clustering process as a time consuming task and it works well
only on spherical shaped data set not in general shaped data set [4]. To satisfy more general data set,
*
Address for correspondence: Lin Sun, College of Computer and Information Engineering, Henan Normal University,
Xinxiang, China. Tel.: 03733329075; E-mail: linsunok@gmail.com.
0959-2989/15/$35.00 © 2015 – IOS Press and the authors.
DOI 10.3233/BME-151489
IOS Press
Bio-Medical Materials and Engineering 26 (2015) S1863–S1869
This article is published with Open Access and distributed under the terms of the Creative Commons Attribution and Non-Commercial License.
S1863