Meta-ELM: ELM with ELM hidden nodes
Shizhong Liao
n
, Chang Feng
School of Computer Science and Technology, Tianjin University, Tianjin 300072, China
article info
Article history:
Received 26 August 2012
Received in revised form
10 January 2013
Accepted 21 January 2013
Available online 25 October 2013
Keywords:
Extreme learning machine
Meta-learning
ELM hidden node
abstract
Extreme Leaning Machine (ELM) simply randomly assigns input weights and biases, ineluctably leading
to certain stochastic behaviors and reducing generalization performance. In this paper, we propose a
meta-learning model of ELM, called Meta-ELM. The Meta-ELM architecture consists of several base ELMs
and one top ELM. Therefore, the Meta-ELM learning proceeds in two stages. First, each base ELM is
trained on a subset of the training data. Then, the top ELM is learned with the base ELMs as hidden
nodes. Theoretical analysis and experimental results on a few artificial and benchmark regression
datasets show that the proposed Meta-ELM model is feasible and effective.
& 2013 Elsevier B.V. All rights reserved.
1. Introduction
The extreme learning machine (ELM) is a new efficient learning
algorithm proposed by Huang et al. [1–6]. The novel idea of ELM is
that input weights and biases are randomly assigned, and that the
simple least square solution takes place of traditional time-
consuming optimization algorithms [7]. However, the random
input weights and biases bring certain randomness into ELM,
and this affects the generalization ability of ELM [7].
Recently a lot of work has been done around ELM to improve its
generalization performance [8–11,7]. They are mainly based on a
statistical result that the average of independent estimators
(i.e. models) has less variance than the typical individual estimator
[12]. The combination of models outperforms the best single one
[12]. Sun et al. [8] and Lan et al. [9] train multiple ELMs on the
whole dataset and take the average of them as the final predictor.
The experimental results demonstrate that combination of ELMs
achieves better generalization performance than original ELM.
Heeswijk et al. [10,11] also get better results through (1) adjusting
the weight of each ELM iteratively [10], (2) deciding the weights
based on LOO error or using least-square solution on a validation
subset [11]. Zhao et al. [7] set the weights of ELMs by minimizing
∑
J
j ¼ 1
∑
ðx
i
;t
i
Þ A L
j
t
i
∑
K
k ¼ 1
α
k
M
ðjÞ
k
ðx
i
Þ
!
2
; ð1Þ
where the scalar t
i
presents the expected output in the j-fold
dataset. As mentioned above, all improve the original ELM in
terms of generalization. However, for large scale problems, they all
suffer from high computational cost due to repeatedly training of
ELMs on the whole dataset.
In order to overcome the drawback, we propose a meta-learning
model to combine ELMs, which is inspired by Collobert et al. [1 3] and
mixture of experts [14 –16]. This model trains ELM on part of dataset.
Instead of adjusting ELMs' w eights, our proposed model analytically
determines ELMs' weights as the wa y ELM does on the hidden nodes.
In this wa y , we get a hierarchial ELM architecture called “Meta-ELM”.
Theoretical analysis and e xperimental results on a few artificial and
benchmark regression datasets show that Meta-ELM obtains good
performance and decreases computational cost caused by training
multiple ELMs.
The organization of the paper goes as follows: in Section 2,we
introduce ELM and its optimization theory. In Section 3,we
propose the Meta-ELM model, present the Meta-ELM learning
algorithm and analyze the computational complexity. Followed in
Section 4, we apply Meta-ELM learning algorithm on a few
artificial and benchmark datasets. A short conclusion then follows.
2. Extreme learning machine
This section briefly reviews ELM proposed in [1–6]. One key
principle of ELM is that one may randomly choose and fix the
hidden node parameters. Thereafter single hidden layer feedfor-
ward network (SLFN) becomes a linear system, of which the
output weights can be analytically determined by generalized
inverse of the hidden layer output matrix.
2.1. Single hidden layer feedforward network
with random hidden nodes
SLFN functions with L hidden nodes can be represented by
f
L
¼ ∑
L
i ¼ 1
β
i
g
i
ðxÞ¼ ∑
L
i ¼ 1
β
i
Gðw
i
; b
i
; xÞ; xA R
d
; β
i
A R
m
; ð2Þ
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/neucom
Neurocomputing
0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.neucom.2013.01.060
n
Corresponding author. Tel.: þ86 136 421 57549.
E-mail address: szliao@tju.edu.cn (S. Liao).
Neurocomputing 128 (2014) 81–87