J. Parallel Distrib. Comput. 108 (2017) 85–94
Contents lists available at ScienceDirect
J. Parallel Distrib. Comput.
journal homepage: www.elsevier.com/locate/jpdc
A parallel approximate SS-ELM algorithm based on MapReduce for
large-scale datasets
Cen Chen
a,b
, Kenli Li
a,b,∗
, Aijia Ouyang
d
, Keqin Li
a,b,c
a
College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
b
National Supercomputing Center in Changsha, Changsha, Hunan 410082, China
c
Department of Computer Science, State University of New York, New Paltz, NY 12561, USA
d
Department of Information Engineering, Zunyi Normal College, Zunyi, Guizhou 563006, China
h i g h l i g h t s
• The paper proposes an approximate SS-ELM (PASS-ELM) algorithm on MapReduce.
• Several optimizations are adopted to improve the performance and scalability.
• An approximate adjacent similarity matrix calculation algorithm based on LSH is proposed.
• Extensive experiments have proven that our algorithm is efficient.
a r t i c l e i n f o
Article history:
Received 27 July 2015
Received in revised form
2 August 2016
Accepted 8 January 2017
Available online 21 January 2017
Keywords:
PASS-ELM
MapReduce
LSH
Parallel
Approximate algorithm
Big data
a b s t r a c t
Extreme Learning Machine (ELM) algorithm not only has gained much attention of many scholars
and researchers, but also has been widely applied in recent years especially when dealing with big
data because of its better generalization performance and learning speed. The proposal of SS-ELM
(semi-supervised Extreme Learning Machine) extends ELM algorithm to the area of semi-supervised
learning which is an important issue of machine learning on big data. However, the original SS-ELM
algorithm needs to store the data in the memory before processing it, so that it could not handle large and
web-scale data sets which are of frequent appearance in the era of big data. To solve this problem, this
paper firstly proposes an efficient parallel SS-ELM (PSS-ELM) algorithm on MapReduce model, adopting
a series of optimizations to improve its performance. Then, a parallel approximate SS-ELM Algorithm
based on MapReduce (PASS-ELM) is proposed. PASS-ELM is based on the approximate adjacent similarity
matrix (AASM) algorithm, which leverages the Locality-Sensitive Hashing (LSH) scheme to calculate the
approximate adjacent similarity matrix, thus greatly reducing the complexity and occupied memory.
The proposed AASM algorithm is general, because the calculation of the adjacent similarity matrix is the
key operation in many other machine learning algorithms. The experimental results have demonstrated
that the proposed PASS-ELM algorithm can efficiently process very large-scale data sets with a good
performance, without significantly impacting the accuracy of the results.
© 2017 Elsevier Inc. All rights reserved.
1. Introduction
With the development of information technology, data takes a
trend of explosive growth in recent years. How to conduct data
mining and machine learning on a large number of data turns
∗
Corresponding author at: College of Information Science and Engineering,
Hunan University, Changsha, Hunan 410082, China.
E-mail addresses: chencen@hnu.edu.cn (C. Chen), lkl@hnu.edu.cn (K. Li),
oyaj@hnu.edu.cn (A. Ouyang), lik@newpaltz.edu (K. Li).
to be an important issue in the era of big data [20,13,1]. Huang
et al. [9] put forward ELM (Extreme Learning Machine) in 2004
to train the single-hidden layer feedforward neural networks
(SLFN), which later has been studied by many scholars because of
its better generalization performance and faster learning speed.
In the past few years, great progress has been made in both
theoretical research and practical application, as evidenced by
different variants of the ELM algorithm. However, they are mainly
applied in the area of supervised learning, such as regression
analysis and classification. Huang et al. [7] have proposed semi-
supervised Extreme Learning Machine (SS-ELM) based on manifold
http://dx.doi.org/10.1016/j.jpdc.2017.01.007
0743-7315/© 2017 Elsevier Inc. All rights reserved.