MapReduce下的大规模数据集上的近似SS-ELM并行算法

需积分: 9 69 浏览量更新于2024-09-10 收藏 1.06MB PDF 举报

本文主要探讨了一种基于MapReduce的并行近似单隐层神经网络（Single-Scale Extreme Learning Machine, SS-ELM）算法，针对大规模数据集的设计与优化。题目"Parallel Approximate SS-ELM Algorithm based on MapReduce"明确了研究的核心内容，即如何利用分布式计算框架MapReduce来加速SS-ELM算法在处理海量数据时的性能。首先，研究者提出了一个名为PASS-ELM（Parallel Approximate SS-ELM）的新算法，其目的是解决传统SS-ELM在处理大规模数据时面临的计算复杂性和效率问题。通过并行化技术，该算法将任务分解到多个计算节点上，显著提高了处理能力，从而适应大数据环境下的实时学习需求。为了进一步提升算法的性能和扩展性，研究者提出了一种基于Least Squares Hashing (LSH)的近似邻接相似度矩阵计算方法。LSH是一种随机投影技术，能够在保持数据局部结构的同时，降低计算复杂度。这种方法在不精确度和计算效率之间取得了良好的平衡，使得算法能在保证结果精度的同时，显著减少内存消耗和计算时间。实验部分展示了PASS-ELM算法在多个大型数据集上的表现，对比了其与传统SS-ELM及其他类似算法的效率和准确性。结果显示，PASS-ELM在大规模数据处理场景下具有明显的优势，不仅运行速度快，而且在保持较高预测精度的同时，能够有效扩展到分布式环境中，满足实时分析和在线学习的需求。总结来说，这项工作不仅为SS-ELM在大数据处理中的应用提供了新的解决方案，还展示了分布式计算框架如MapReduce在提升机器学习算法性能方面的潜力。这对于那些需要处理海量数据、追求快速响应时间和高效率的领域，如物联网、云计算和人工智能，具有重要的实际意义。同时，这种算法的优化策略也为其他大规模机器学习算法的并行化设计提供了借鉴。

J. Parallel Distrib. Comput. 108 (2017) 85–94

Contents lists available at ScienceDirect

J. Parallel Distrib. Comput.

journal homepage: www.elsevier.com/locate/jpdc

A parallel approximate SS-ELM algorithm based on MapReduce for

large-scale datasets

Cen Chen

a,b

, Kenli Li

a,b,∗

, Aijia Ouyang

, Keqin Li

a,b,c

College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China

National Supercomputing Center in Changsha, Changsha, Hunan 410082, China

Department of Computer Science, State University of New York, New Paltz, NY 12561, USA

Department of Information Engineering, Zunyi Normal College, Zunyi, Guizhou 563006, China

h i g h l i g h t s

• The paper proposes an approximate SS-ELM (PASS-ELM) algorithm on MapReduce.

• Several optimizations are adopted to improve the performance and scalability.

• An approximate adjacent similarity matrix calculation algorithm based on LSH is proposed.

• Extensive experiments have proven that our algorithm is efficient.

a r t i c l e i n f o

Article history:

Received 27 July 2015

Received in revised form

2 August 2016

Accepted 8 January 2017

Available online 21 January 2017

Keywords:

PASS-ELM

MapReduce

LSH

Parallel

Approximate algorithm

Big data

a b s t r a c t

Extreme Learning Machine (ELM) algorithm not only has gained much attention of many scholars

and researchers, but also has been widely applied in recent years especially when dealing with big

data because of its better generalization performance and learning speed. The proposal of SS-ELM

(semi-supervised Extreme Learning Machine) extends ELM algorithm to the area of semi-supervised

learning which is an important issue of machine learning on big data. However, the original SS-ELM

algorithm needs to store the data in the memory before processing it, so that it could not handle large and

web-scale data sets which are of frequent appearance in the era of big data. To solve this problem, this

paper firstly proposes an efficient parallel SS-ELM (PSS-ELM) algorithm on MapReduce model, adopting

a series of optimizations to improve its performance. Then, a parallel approximate SS-ELM Algorithm

based on MapReduce (PASS-ELM) is proposed. PASS-ELM is based on the approximate adjacent similarity

matrix (AASM) algorithm, which leverages the Locality-Sensitive Hashing (LSH) scheme to calculate the

approximate adjacent similarity matrix, thus greatly reducing the complexity and occupied memory.

The proposed AASM algorithm is general, because the calculation of the adjacent similarity matrix is the

key operation in many other machine learning algorithms. The experimental results have demonstrated

that the proposed PASS-ELM algorithm can efficiently process very large-scale data sets with a good

performance, without significantly impacting the accuracy of the results.

1. Introduction

With the development of information technology, data takes a

trend of explosive growth in recent years. How to conduct data

mining and machine learning on a large number of data turns

∗

Corresponding author at: College of Information Science and Engineering,

Hunan University, Changsha, Hunan 410082, China.

E-mail addresses: chencen@hnu.edu.cn (C. Chen), lkl@hnu.edu.cn (K. Li),

oyaj@hnu.edu.cn (A. Ouyang), lik@newpaltz.edu (K. Li).

to be an important issue in the era of big data [20,13,1]. Huang

et al. [9] put forward ELM (Extreme Learning Machine) in 2004

to train the single-hidden layer feedforward neural networks

(SLFN), which later has been studied by many scholars because of

its better generalization performance and faster learning speed.

In the past few years, great progress has been made in both

theoretical research and practical application, as evidenced by

different variants of the ELM algorithm. However, they are mainly

applied in the area of supervised learning, such as regression

analysis and classification. Huang et al. [7] have proposed semi-

supervised Extreme Learning Machine (SS-ELM) based on manifold

http://dx.doi.org/10.1016/j.jpdc.2017.01.007

下载后可阅读完整内容，剩余9页未读，立即下载

JamesLi6

粉丝: 62
资源: 7

MapReduce下的大规模数据集上的近似SS-ELM并行算法

极限学习机ELM+OSELM+KELM+半监督SSELM+USELM的matlab程序(附完整代码)

mapreduce近三年参考文献

parallel-and-high-performance-computing

帮我写一个c++程序，要求如下：esim_tool --model=<model.bin> --input=<ifmap.bin> --output=<ofmap.bin> --infer_order=<depthfirst|breadthfirst|random|parallel> [--dump=dump_dir]

hive mapreduce

ceph 测试io速度

mapreduce on yarn

最新资源