异质极限学习机集成：基因表达数据分类新方法

96 浏览量更新于2024-08-26 收藏 875KB PDF 举报

"这篇研究论文探讨了如何利用基于异质性的极限学习机集成（Extreme Learning Machine Ensemble, ELM Ensemble）进行基因表达数据分类。极限学习机（ELM）以其快速的学习速度和优秀的泛化性能而受到关注，但单个ELM在数据分类任务中可能表现出不稳定性。为了解决这一问题，研究者们开始考虑使用ELM的集成方法。本文提出了一种基于差异性的ELM集成策略，通过引入双故障度量（Double-fault measure）和多数投票（Majority voting）机制，以提高分类的准确性和鲁棒性。" 正文: 极限学习机（ELM）是一种单隐藏层前馈神经网络，其主要优点在于训练过程非常快速，因为权重连接隐藏层到输出层是随机生成的，无需反向传播来调整。这使得ELM在处理大规模数据集时具有显著优势。然而，尽管ELM在许多应用中表现良好，但在面对复杂或异质性的基因表达数据时，单个ELM的分类性能可能会波动，因为它无法充分捕捉数据的多样性和复杂性。为了改善这一情况，这篇论文提出了一个创新的解决方案，即构建一个基于异质性的ELM集成系统。这个系统的核心思想是利用数据的差异性，即不同ELM成员对同一输入数据的不同响应，来增强整体分类能力。这种差异性可以通过计算样本之间的不相似性（dissimilarity）来量化，进而指导集成中的多个ELM做出独立的决策。论文中提到的双故障度量（Double-fault measure）是一种评估模型稳定性和可靠性的方法。它考虑了模型在错误分类时可能出现的两种情况：误分类为正确类别和误分类为其他错误类别。通过这种度量，可以识别那些在分类中更易出错的ELM成员，并减少它们对最终结果的影响。多数投票（Majority voting）是集成学习中常用的一种策略，它根据各个模型的分类结果，选择出现次数最多的类别作为最终预测。在ELM集成中，多数投票机制可以确保即使部分ELM成员出现错误，整个系统的分类结果仍然能够保持较高的准确性。在基因表达数据分类的应用中，由于基因表达水平的复杂性和多变性，使用基于异质性的ELM集成方法可以更好地理解和捕捉数据的内在模式。通过这种方式，论文的作者们期望能够提高对疾病诊断、基因功能预测等生物信息学任务的分类效果，从而推动医学研究的进步。这篇研究论文深入探讨了如何利用ELM的集成策略来提升基因表达数据的分类性能。通过结合差异性、双故障度量和多数投票机制，该方法有望在处理高维、复杂数据时展现出更高的稳定性和准确性。这不仅对于生物信息学领域，也对于其他依赖机器学习进行复杂数据分析的领域，如环境科学、社会科学等，都具有重要的理论和实践意义。

Dissimilarity based ensemble of extreme learning machine

for gene expression data classiﬁcation

Hui-juan Lu

a,b,

, Chun-lin An

, En-hui Zheng

,YiLu

College of Information Engineering, China Jiliang University, Hangzhou 310018, China

School of Information and Electric Engineering, China University of Mining and Technology, Xuzhou 221008, China

College of Mechanical and Electric Engineering, China Jiliang University, Hangzhou 310018, China

Department of Computer Science, Prairie View A&M University, Prairie View 77446, USA

article info

Article history:

Received 18 September 2012

Received in revised form

4 February 2013

Accepted 11 February 2013

Available online 8 November 2013

Keywords:

Extreme learning machine

Dissimilarity ensemble

Double-fault measure

Majority voting

Gene expression data

abstract

Extreme learning machine (ELM) has salient features such as fast learning speed and excellent

generalization performance. However, a single extreme learning machine is unstable in data classiﬁca-

tion. To overcome this drawback, more and more researchers consider using ensemble of ELMs. This

paper proposes a method integrating voting-based extreme learning machines (V-ELMs) with dissim-

ilarity (D-ELM). First, based on different dissimilarity measures, we remove a number of ELMs from the

ensemble pool. Then, the remaining ELMs are grouped as an ensemble classiﬁer by majority voting.

Finally we use disagreement measure and double-fault measure to validate the D-ELM. The theoretical

analysis and experimental results on gene expression data demonstrate that (1) the D-ELM can achieve

better classiﬁcation accuracy with less number of ELMs; (2) the double-fault measure based D-ELM

(DF-D-ELM) performs better than disagreement measure based D-ELM (D-D-ELM).

1. Introduction

Human genome project (HGP) was ofﬁcially launched in 1990.

In the short span of 20 years, gene technology obtained rapid

development. Golub et al. [1] were the ﬁrst to use gene chips to

study the human acute leukemia, and found two subtypes of acute

lymphoblastic leukemia: T2Cell ALL and B2Cell ALL. The classiﬁ ca-

tion methods that were used on gene expression data early

include the support vector machine (SVM) [2], artiﬁcial neural

networks (ANNs) [3], and probabilistic neural network (PNN) [4].

Jin et al. [5] used the partial least squares method to establish a

classiﬁcation model. Zhang et al. [6] applied non-negative matrix

factorization (NMF) for the gene expression data classiﬁcation.

Yang et al. [7] used a binary decision tree to classify gene

expression data of tumor.

The extreme learning machine (ELM) [8] was proposed as an

efﬁcient learning algorithm for single-hidden layer feedforward

neural networks (SLFNs). It increases learning speed by means of

randomly generating weights and biases for hidden nodes rather

than iteratively adjusting network parameters which is commonly

adopted by gradient based methods.

However, the stability of single ELM can be improved. To

achieve better generalization performance, Lan et al. [9] proposed

an ensemble of online sequential extreme learning machine (EOS-

ELM) which is more stable and accurate than the original OS-ELM.

Motiv at ed by the ensemble idea, in 2009 van Heeswijk et al. [10]

proposed an adaptive ensemble model of ELM which is adaptive and

has low computational cost. In 2010, Tian and Meng proposed a

bagging ensemble scheme to combine ELMs [11], and another ELM

ensemble method based on the modiﬁed AdaBoost.RT algorithm

[1 2]. In the same year, an ensemble based ELM (EN-ELM) algorithm

wasproposedbyLiuandWang[13] which uses the cross-v alidation

scheme to create an ensemble of ELM classiﬁers for decision making.

Wang and Li [14] proposed a dynamic Adaboost ensemble ELM

which has been successfully applied to problem of function appro x-

imation and classiﬁcation application. Zhai et al. [15] proposed a

dynamic ensemble of sample entropy based extreme learning

machines, which can alleviate some extent of instability and over -

ﬁttin g problem, and increase the prediction accuracy. In 2011,

Heeswi jk et al. [1 6] proposed a method which is based on GPU-

accelerat ed and parallelized ELM ensemble, and is used in large-scale

regression. In 2012, Wang and Alhamdoosh [17] proposed an algo-

rithm which employs the model diversity as a ﬁtness function to

direct the selection of base learners, and produces an optimal

solution with ensemble size control. It improved the generalization

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/neucom

Neurocomputing

http://dx.doi.org/10.1016/j.neucom.2013.02.052

☆

This work was supported by the National Natural Science Foundation of China

(Nos. 61272315, 60842009, and 60905034), Zhejiang Provincial Natural Science

Foundation (Nos. Y1110342, Y1080950) and the Pao Yu-Kong and Pao Zhao-Long

Scholarship for Chinese Students Studying Abroad.

Corresponding author at: College of Information Engineering, China Jiliang

University, Hangzhou 310018, China. Tel.: þ86 57186914580;

fax: þ86 57186914573.

E-mail addresses: hjlu@cjlu.edu.cn, huijuanlu29@gmail.com (H.-j. Lu).

Neurocomputing 128 (2014) 22–30

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38548434

粉丝: 3
资源: 945

异质极限学习机集成：基因表达数据分类新方法

2008-2020年最新3W+数据高管团队异质性数据、代码、计算过程、原始数据

stata异质性检验代码和数据（东中西地区差异、中位数、均值、组间系数差异检验）

异质性极限学习机集成在基因表达数据分类中的应用

使用基因表达数据的基于RPCA的肿瘤分类

基于机器学习LightGBM和异质集成学习方法的新闻分类.pdf

AMPAD_Submodules:分析AMP-AD基因表达数据以检测与LOAD相关的亚模块和异质性，发表于Milind等人。 2020年

全国298个地级市类型分类 地级市异质性数据 城市分组异质性数据

基于异质矩阵完全的缺失数据恢复混合集成算法* (2013年)

HER2-heterogeneity:Ng 等人的补充信息，具有异质性 HER2 基因扩增的乳腺癌的肿瘤内遗传异质性和替代驱动基因改变

JOMG-Henry_Trousdell_Cyrill_Et_al：鉴定发育中的乳腺中细胞异质性的基因表达特征的表征

最新资源

全国298个地级市类型分类地级市异质性数据城市分组异质性数据