异质性极限学习机集成在基因表达数据分类中的应用

88 浏览量更新于2024-08-30 收藏 642KB PDF 举报

"这篇研究论文探讨了一种基于异质性的极限学习机集成方法，用于基因表达数据的分类。极限学习机（ELM）以其快速的学习速度和优秀的泛化性能而著称，但在数据分类中可能存在不稳定性。为了解决这个问题，研究者们考虑使用ELM的集成策略。本文提出了一种结合投票机制（V-ELM）和差异性（D-ELM）的方法，通过不同的相似性度量来优化ELM集成。首先，根据不同的不相似度度量，从集成池中移除部分ELM。接着，剩余的ELM通过多数投票的方式被组合成一个集成分类器。最后，该方法在基因表达数据上进行实验，以验证其性能和效果。" 这篇论文主要关注的是如何提高极限学习机在处理基因表达数据分类问题时的稳定性和准确性。极限学习机是一种单隐藏层前馈神经网络，它的优势在于其训练过程快速且泛化能力良好。然而，单一的ELM可能会因为数据的复杂性和噪声导致分类性能不稳定。为了解决这一问题，论文提出了将多种基于差异性的ELM进行集成的策略。集成学习是机器学习中的一个重要概念，它通过组合多个学习器来提升整体性能，通常能获得比单个学习器更好的预测结果。论文中，作者采用了基于投票的集成方法（V-ELM），这是一种通过多个模型的预测结果来决定最终分类的方法。同时，通过引入不相似度度量（D-ELM），可以更好地处理数据的异质性，即数据之间的差异性和多样性。不同的不相似度度量可以帮助识别和利用数据的结构信息，从而提高分类的准确性和鲁棒性。在实际操作中，首先，研究者使用不同不相似度度量来筛选和去除一部分不稳定的ELM，这一步是为了减少集成中的冗余和提高整体一致性。然后，剩下的ELM通过多数投票原则被组合成一个集成分类器，这样做的目的是使得最终的分类决策更加稳健，减少错误分类的概率。论文的实验部分可能涉及了在多种基因表达数据集上的测试，以评估所提出的集成方法的效果。通过对比其他分类算法的性能，比如传统的ELM、随机森林、支持向量机等，可以证明D-ELM和V-ELM集成方法在处理此类复杂数据时的优势。实验结果通常会包括精度、召回率、F1分数等指标，以全面展示方法的性能。这篇研究论文为基因表达数据的分类提供了一种新的解决方案，通过结合差异性和投票机制优化极限学习机的集成，旨在提高分类的稳定性和准确性，对于生物信息学领域以及更广泛的机器学习应用具有一定的理论和实践价值。

Dissimilarity Based Ensemble of Extreme Learning Machine for Gene Expression Data

Classiﬁcation

Hui-juan LU

a,b,∗

, Chun-lin AN

, En-hui ZHENG

, Yi LU

College of Information Engineering, China Jiliang University, Hangzhou 310018, China

School of Information and Electric Engineering, China University of Mining and technology, Xuzhou 221008, China

College of Mechanical and Electric Engineering, China Jiliang University, Hangzhou 310018, China

Department of Computer Science, Prairie View A&M University, Prairie View, 77446, U.S.A

Abstract

Extreme Learning Machine (ELM) has salient features such as fast learning speed and excellent generalization performance. How-

ever, a single extreme learning machine is unstable in data classiﬁcation. To overcome this drawback, more and more researchers

consider using ensemble of ELMs. This paper proposes a method integrating voting-based extreme learning machines (V-ELM)

with dissimilarity (D-ELM). First, based on diﬀerent dissimilarity measures, we remove number of ELMs from the ensemble pool.

Then, the remaining ELMs are grouped as an ensemble classiﬁer by majority voting. Finally we use disagreement measure and

double-fault measure to validate the D-ELM. The theoretical analysis and experimental results on gene expression data demonstrate

that, 1) the D-ELM can achieve better classiﬁcation accuracy with less number of ELMs; 2) the double-fault measure based D-ELM

(DF-D-ELM) performs better than disagreement measure based D-ELM (D-D-ELM).

Keywords: extreme learning machine, dissimilarity ensemble, double-fault measure, majority voting, gene expression data

1. Introduction

Human genome project (HGP) was oﬃcially launched in

1990. In the short span of 20 years, gene technology obtained

rapid development. Golub et al. [1] were the ﬁrst to use gene

chips to study the human acute leukemia, and found two sub-

types of acute lymphoblastic leukemia: T2Cell ALL and B2Cell

ALL. The classiﬁcation methods that were used on gene expres-

sion data early include the support vector machine (SVM) [2],

Artiﬁcial Neural Networks (ANN) [3], and Probabilistic Neu-

ral Network (PNN) [4]. Jin et al. [5] used partial least squares

method to establish a classiﬁcation model. Zhang et al. [6]

applied Non-negative Matrix Factorization (NMF) for the gene

expression data classiﬁcation. Yang et al. [7] used binary deci-

sion tree to classify gene expression data of tumor.

Extreme learning machine (ELM) [8] was proposed as an

eﬃcient learning algorithm for single-hidden layer feedforward

neural networks (SLFNs). It increases learning speed by means

of randomly generating weights and biases for hidden nodes

rather than iteratively adjusting network parameters which is

commonly adopted by gradient based methods.

However, the stability of single ELM can be improved. To

achieve better generalization performance, Lan et al. [9] pro-

This work was supported by the National Natural Science Foundation of

China (No. 61272315, No.60842009, and No. 60905034), Zhejiang Provincial

Natural Science Foundation (No. Y1110342, No. Y1080950) and the Pao Yu-

Kong and Pao Zhao-Long Scholarship for Chinese Students Studying Abroad.

∗

Corresponding author.Tel.:+8657186914580; fax:+8657186914573.

Email address: hjlu@cjlu.edu.cn, huijuanlu29@gmail.com

(Hui-juan LU)

posed an ensemble of online sequential extreme learning ma-

chine (EOS-ELM) which is more stable and accurate than the

original OS-ELM.

Motivated by the ensemble idea, in 2009 Heeswijk et al.

[10] proposed an adaptive ensemble model of ELM which is

adaptive and has low computational cost. In 2010, Tian et

al. proposed a bagging ensemble scheme to combine ELMs

[11], and another ELM ensemble method based on modiﬁed

AdaBoost.RT algorithm [12]. In the same year, an ensemble

based ELM (EN-ELM) algorithm was proposed by Liu et al.

[13] which uses the cross-validation scheme to create an en-

semble of ELM classiﬁers for decision making. Wang and Li

[14] proposed dynamic Adaboost ensemble ELM which has

been successfully applied to problem of function approxima-

tion and classiﬁcation application. Zhai et al. [15] proposed a

dynamic ensemble of sample entropy based extreme learning

machines, which can alleviate some extent of instability and

over-ﬁtting problem, and increase the prediction accuracy. In

2011, Heeswijk et al. [16] proposed a method which is based on

GPU-accelerated and parallelized ELM ensemble, and is used

in large-scale regression. In 2012, Wang and Alhamdoosh [17]

proposed an algorithm which employs the model diversity as

ﬁtness function to direct the selection of base learners, and pro-

duces an optimal solution with ensemble size control. It im-

proved the generalization power. Cao et al. [18] proposed an

improved learning algorithm for classiﬁcation that is referred

to as voting based extreme learning machine (V-ELM) which is

adopted widely.

The ensemble classiﬁers have already been used in gene ex-

pression data classiﬁcation. Chen et al. [19] used artiﬁcial neu-

Preprint submitted to Neurocomputing February 2, 2013

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38618521

粉丝: 8
资源: 915

异质性极限学习机集成在基因表达数据分类中的应用

2008-2020年最新3W+数据高管团队异质性数据、代码、计算过程、原始数据

stata异质性检验代码和数据（东中西地区差异、中位数、均值、组间系数差异检验）

异质极限学习机集成：基因表达数据分类新方法

使用基因表达数据的基于RPCA的肿瘤分类

基于机器学习LightGBM和异质集成学习方法的新闻分类.pdf

AMPAD_Submodules:分析AMP-AD基因表达数据以检测与LOAD相关的亚模块和异质性，发表于Milind等人。 2020年

全国298个地级市类型分类 地级市异质性数据 城市分组异质性数据

基于异质矩阵完全的缺失数据恢复混合集成算法* (2013年)

HER2-heterogeneity:Ng 等人的补充信息，具有异质性 HER2 基因扩增的乳腺癌的肿瘤内遗传异质性和替代驱动基因改变

JOMG-Henry_Trousdell_Cyrill_Et_al：鉴定发育中的乳腺中细胞异质性的基因表达特征的表征

最新资源

全国298个地级市类型分类地级市异质性数据城市分组异质性数据