并行RMCLP算法在医疗数据分析中的高效应用

77 浏览量更新于2024-08-27 收藏 387KB PDF 举报

"本文主要探讨了并行RMCLP(Regularized Multiple-Criteria Linear Programming)分类算法，并介绍了其在医学数据分析中的应用。该算法旨在利用云计算技术的优势，解决随着训练样本数量增加而带来的计算和存储需求挑战。文章提出了将RMCLP模型转化为无约束优化问题的策略，并进一步将其分解为多个并行任务，从而提高处理效率。" 在当前的IT领域，数据量的爆炸性增长，尤其是在医学领域，使得传统的机器学习和数据挖掘方法面临巨大的压力。并行计算作为一种有效手段，能够分摊大规模数据处理的负担，提高处理速度。并行RMCLP算法就是针对这一问题提出的一种新方法。 RMCLP算法是一种多准则线性规划模型，它允许在多个目标函数之间进行权衡，以寻找最优解决方案。在分类问题中，这通常意味着在尽可能准确地分类的同时，也要考虑其他因素，如模型的复杂度或可解释性。通过将RMCLP转换为无约束优化问题，可以更方便地应用优化算法，比如梯度下降或其他数值优化方法，以找到满足多个目标的最佳分类边界。并行化是通过将大任务分解成若干个小任务，然后在多个处理器或计算节点上同时执行这些小任务来实现的。在PRMCLP算法中，每个任务可能对应于数据集的一部分或者优化过程的一个阶段。这种并行处理方式可以显著减少整体计算时间，尤其在处理大规模医学数据集时，能够显著提高分类模型的训练效率。在医学数据的应用中，分类算法的目标通常是识别疾病的早期迹象、预测疾病进展或预后，以及个性化治疗方案的制定。PRMCLP算法可以处理高维特征空间和大量样本，这对于医学影像分析、基因表达数据的分类，以及电子健康记录的分析等具有重要意义。它可以处理多维度的临床指标，同时考虑多种医疗决策标准，从而提供更全面、更准确的分析结果。此外，文章还可能讨论了实验设计、性能评估指标（如准确率、召回率、F1分数）以及与其他分类算法（如SVM、决策树等）的比较。通过实证研究，PRMCLP算法可能展示了在处理复杂医学数据时的优越性能，证明了并行计算对于解决大数据分析问题的有效性。总结来说，"并行RMCLP分类算法及其在医学数据中的应用"这篇研究论文，提出了一个创新的并行计算方法，用于解决医学数据分类的挑战。该算法不仅提高了处理速度，而且在保留了RMCLP模型的多准则优化特性的同时，适应了云计算环境，为医学领域的数据分析提供了强大的工具。

http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI

10.1109/TCC.2015.2481381, IEEE Transactions on Cloud Computing

JOURNAL OF L

X CLASS FILES, VOL. 13, NO. 9, SEPTEMBER 2014 1

Parallel RMCLP Classiﬁcation Algorithm and Its

Application on the Medical Data

Zhiquan Qi, Yingjie Tian, Yong Shi, Senior Member, IEEE,, and Vassil Alexandrov

Abstract—To make better to use the cloud computing technology, and to overcome the computing and storage requirements which

increase rapidly with the number of training samples, in this paper, a new parallel algorithm is proposed - Parallel Regularized

Multiple-Criteria Linear Programming (PRMCLP) algorithm - The RMCLP model is converted into a unconstrained optimization

problem, and then, in the parallel version, it is split into several task, where each part is mapped and computed on a separate

processor. In this approach enables us to obtain efﬁciently the ﬁnal optimization solution of the whole classiﬁcation problem. At last, we

apply this algorithm into Medical data classiﬁcation. All experiments show that our method and approach greatly increases the training

speed of RMCLP in the parallel case.

Index Terms—PRMCLP, parallel algorithm, data mining

✦

1 INTRODUCTION

Owadays, the Big Data bring an unprecedented op-

portunities a nd challenges [1], [2], [3]. On the ohter

hand, the amount of Data is becoming larger and more

complex, which causes us to be inventing novel algorithms

to process efﬁciently the vast ocean of information. This

is true for example, while solving important management

problems and that we need to gain enough knowledge t o

support our decision. One of the most important reasons is

that we still do not have the capabilities to extract much

useful knowledge from Big Dat a. As a result, more and

more reserachers begin to research and introduce new data

mining m e thods and techniques to deal with the increasing

complex data. In this pape r, we design a parallel algorithm

based on Regularized Mu ltiple-Criteria Linear Program-

ming (RMCLP) [4] to further accelerate the training speed,

which will provide a possible way to tackle more efﬁciently

the Big Data type of problems.

In order to accelerate the machine learning process, par-

allelizing classiﬁcat ion algorithms is one of the key and

basic problems in the era of big data. Support Vector Ma-

chine(SVM) ( [5], [6], [7]) is one of the most popular classiﬁ-

cation me thods. However, the idea of applying optimization

techniques to solve the classiﬁcation problem can be dated

back to more than 70 ye ars ago when linear discriminant

analysis (LDA) ( [8]) was ﬁrst proposed in 1936.

In [9] Mangasarian has proposed a similar model with

SVM us ing the large margin idea in 1960’s. From 1980s to

1990s , Glover proposed a number of linear programming

models to solve discriminant problems with a small sample

size of data ( [10], [11]). Other classiﬁcation models also can

be found in ( [12], [13], [14], [15], [16]). Recently, Shi and his

colleagues( [17]) extend Glover’s method into classiﬁcat ion

Zhiquan Qi, Yingj ie Tian (the corresponding author) and Yong Shi are with

the Research Center on Fictitious Economy and D ata Science, and with

Key Laboratory of Big Data Mining and Knowledge management, Chinese

Academy of Sciences, Beijing 100190, China. Vassil Alexandrov is with

ICREA and Barcelona Supercomputing Center, C/Jordi Gi rona, 29, Ediﬁci

Nexus II, E-08034 Ba rcelona, Spain

via Multiple Criteria Linear Programming (MCLP), and then

various improved algorithms were proposed one after the

other ( [4], [18], [19], [20], [21], [22], [23]). These mat he-

matical programming approaches to classiﬁcation have been

applied to handle many real world data mining problems,

such as credit card portfolio management ( [24], [25], [26]),

bioinformatics ( [27]), information intrusion and detection (

[28]), ﬁrm bankruptcy ( [29]), and etc.

In order to parallelize the classiﬁcation algorithm, there

are usually two strategies employed: 1) divide-and-conquer

or 2) parallelization of the s e rial algorithm. In the case of

the ﬁrst strategy, a large scale problem, can be divided it

into sev e ral sub-problem, which are mutually independent

and ha ve same form as the primal problem. Then, these sub-

problems are solved recursively. The combining the results,

the solution of the primal problem is obtained. We can ﬁnd

such typical me thods in [30], [31], [32]. The second strategy

is based on the parallel nature of the algorithm itself. Several

typical methods include [33], [34], [35], [3 6].

In this paper, the focus is on the RMCLP, and the de-

signed and proposed Parallel version of RMCLP algorithm

(PRMCLP). In order to overcome the compute and storage

requirements that increase rapidly with the number of train-

ing sample, the second strategy is adopted, inspire by some

ﬁndings in [37].

Firstly, RMCLP model is converted into a unconstrained

optimization p roblem, a nd then split into several parts,

which are then m apped on and the computation is per-

formed on p processors in parallel. After that, t he results

obtained by each processors are analyzed and summarized,

and the results of the sub-p roblems are taken as a pa-

rameterized input in the next step. This loop is executed

until the optimizat ion solution of the whole classiﬁcation

problem is obtained by satisfying the termination condition.

Experiments using public datas e ts show that our method

greatly increases the training speed of RMCLP while using

p processors.

The remaining parts of the paper are organized as follows.

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38662213

粉丝: 3
资源: 915

并行RMCLP算法在医疗数据分析中的高效应用

最新资源