基因表达数据的模糊核聚类新方法：FKCA算法

PDF格式 | 448KB | 更新于2024-08-28 | 119 浏览量 | 举报

"一种有效的基因表达数据模糊核聚类分析方法" 在生物信息学领域，模糊聚类是一种常用的技术，用于分析微阵列基因表达数据。微阵列数据的复杂性和不确定性使得传统的聚类方法面临挑战，特别是选择合适的聚类数量和中心。针对这个问题，文章提出了一种名为FKCA（Fuzzy Kernel Clustering Analysis）的新方法，它能够自动识别合适的聚类数目，并生成更为稳定的结果。首先，文章引入了高斯核函数改进了谱分析方法（SAM，Spectral Analysis Method）。高斯核函数有助于优化特征差异，通过计算基因表达数据的相似性，可以更准确地估计最佳聚类数。这是解决微阵列数据聚类问题的关键一步，因为选择正确的聚类数对于数据分析的准确性至关重要。接着，作者提出了一种称为最大距离法（MDM，Maximum Distance Method）来确定聚类中心。MDM结合了减法聚类和最大-最小距离均值，能够更有效地定位基因群集的中心，从而提高聚类的稳定性。减法聚类是一种自下而上的聚类方法，能够减少噪声影响，而最大-最小距离均值则能确保聚类中心的选择不被异常值或噪声所干扰。为了进一步验证和完善这种方法，文章对基因表达数据进行了实验，对比了改进后的SAM（ISAM，Improved SAM）和MDM的性能。实验结果证实了ISAM和MDM的优越性和稳定性，它们在处理基因表达数据时表现出更好的聚类效果。最后，将ISAM和MDM整合到FKCA中，形成了一个改进的FKCA算法。这个算法在公共基因表达数据集和UCI数据库上的实验表明，其在聚类分析中的精度优于其他相关聚类算法，证明了该方法的有效性。 "一种有效的基因表达数据模糊核聚类分析方法"通过引入高斯核函数和最大距离法，提出了一种新颖的模糊聚类策略，解决了微阵列数据聚类中的关键问题，提高了聚类的精确度和稳定性。这种方法对于生物信息学领域的基因表达数据分析具有重要的理论和实践意义。

An effective fuzzy kernel clustering analysis

approach for gene expression data

Lin Sun

a,b, ∗

, Jiucheng Xu

a,b

and Jiaojiao Yin

College of Computer and Information Engineering, Henan Normal University, Xinxiang, China

Engineering Technology Research Center for Computing Intelligence and Data Mining, Henan

Province, China

Abstract. Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering

method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a

new approach to fuzzy kernel clustering analysis (FKCA) that identiﬁes desired cluster number and obtains more steady results

for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian

kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min

distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of

improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing

experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved

FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed

algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.

Keywords: Spectral analysis, maximum distance, fuzzy clustering, gene expression data

1. Introduction

In computational biology, clustering is a useful technique for gene expression data as it groups similar

objects together and allows biologist to identify potential relationships between genes [1]. Unsupervised

clustering methods have been applied to gene expression data analysis, and the unsupervised ensemble

approaches improve accuracy and reliability of clustering results [2]. However, traditional clustering ap-

proaches are inadequately ﬂexible when a gene experiences differential coregulation in different samples

of the same data set as a result of being involved in differing functional relationships [3].

In recent years, the application of kernels in fuzzy c-means (FCM), fuzzy k-means, and evolution

algorithms is effective in terms of improving clustering performance. However, FCM has drawbacks

such as the result of clustering process deteriorates while noise and outliers exist in data set, blindness

of random prototype initialization leads clustering process as a time consuming task and it works well

only on spherical shaped data set not in general shaped data set [4]. To satisfy more general data set,

Address for correspondence: Lin Sun, College of Computer and Information Engineering, Henan Normal University,

Xinxiang, China. Tel.: 03733329075; E-mail: linsunok@gmail.com.

DOI 10.3233/BME-151489

IOS Press

Bio-Medical Materials and Engineering 26 (2015) S1863–S1869

This article is published with Open Access and distributed under the terms of the Creative Commons Attribution and Non-Commercial License.

S1863

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38569219

粉丝: 4

基因表达数据的模糊核聚类新方法：FKCA算法

FCM算法在模糊核聚类技术中的应用研究

模糊核聚类算法源码与应用研究

专家模糊核聚类的判断矩阵赋权决策方法

FCMand_iris_模糊核聚类_模糊聚类_；模糊c聚类_模糊核_

模糊聚类分析方法

模糊核聚类程序

基于判断矩阵的专家模糊核聚类组合赋权方法

模糊聚类分析方法.pdf

模糊聚类分析方法.docx

模糊聚类分析方法简介.doc

最新资源