fa
i
; b
i
g
L
i¼1
randomly generated from any intervals of R
d
R; according to any continuous probability distribution,
with probability one, H
NL
b
Lm
T
Nm
kk
\:
From the interpolation point of view the maximum
number of hidden nodes required is not larger than the
number of training samples. In fact, if L = N, the training
errors can be zero.
Theorem 2.2 [6] Given any activation function g : R !
R which is infinitely differentiable in any interval and
N arbitrary distinct samples ðx
i
; t
i
Þ2R
d
R
m
; for any
fða
i
; b
i
Þg
N
i¼1
randomly generated from any intervals of
R
d
R; according to any continuous probability distribu-
tion, with probability one, H
NN
b
Nm
T
Nm
kk
¼ 0:
From interpolation point of view, wide type of activa-
tion functions can be used in ELM, which include the
sigmoid functions, the radial basis, sine, cosine, exponen-
tial, and many other non-regular functions [13]. It may be
too strict to request that activation functions of hidden
nodes are infinitely differentiable. For example, it may not
include some important activation functions such as
threshold function: gðxÞ¼1
x 0
þ 0
x\0
: Threshold net-
works are very popular in real applications, especially in
digital hardware implementation. However, as threshold
function is not differentiable, researchers did not manage to
find any efficient direct learning algorithms for threshold
networks in the past two decades [27–30]. Interestingly,
from the universal approximation point of view, the above
mentioned interpolation theorem can be extended to almost
any type of nonlinear piecewise continuous function
including the threshold function, and thus an efficient direct
learning algorithm (e.g. ELM) can be applied to those cases
which cannot be handled by other learning techniques in
the past decades.
2.2 Universal approximation theorem
Huang et al. [7] proved in theory that SLFNs with ran-
domly generated additive or RBF nodes can universally
approximate any continuous target functions over any
compact subset X 2 R
d
: Let L
2
(X) be a space of functions
f on a compact subset X in the d-dimensional Euclidean
space R
d
such that jf j
2
are integrable, that is,
R
X
jf ðxÞj
2
dx\1: Let L
2
ðR
d
Þ denoted by L
2
. For u; v 2
L
2
ðXÞ; the inner product hu; vi is defined by
hu; vi¼
Z
X
uðxÞvðxÞdx ð12Þ
The norm in L
2
(X) space is denoted as kk; and the
closeness between the network function f
L
and the target
function f is measured by the L
2
(X) distance:
kf
L
f k¼
Z
X
jf
L
ðxÞf ðxÞj
2
dx
2
4
3
5
1=2
ð13Þ
Definition 2.1 (p. 334 of [31]) A function gðxÞ : R ! R is
said to be piecewise continuous if it has only a finite
number of discontinuities in any interval, and its left and
right limits are defined (not necessarily equal) at each
discontinuity.
Definition 2.2 A node is called a random node if its
parameters ða; bÞ are randomly generated based on a con-
tinuous sampling distribution probability.
Different from the randomness mentioned in other
learning methods [4, 32, 33], all the hidden node parame-
ters ða
i
; b
i
Þ in ELMs can be independent of the training
samples and can be randomly generated before the training
samples observed. (Refer to [34] for the details of the
differences between ELM and Igelnik and Pao [33] and
Lowe et al. [4, 32]).
Definition 2.3 The function sequence fg
i
¼ Gða
i
; b
i
; xÞg
is said randomly generated if the corresponding parameters
ða
i
; b
i
Þ are randomly generated from R
d
R or R
d
R
þ
based on a continuous sampling distribution probability.
Lemma 2.1 (Proposition 1 of [16]) Given g : R !
R; spanfgða x þbÞ : ða; bÞ2R
d
Rg is dense in L
p
for
every p 2½1; 1Þ; if and only if g is not a polynomial
(almost everywhere).
Lemma 2.2 [17] Let k : R
d
! R be an integrable boun-
ded function such that k is continuous (almost everywhere)
and
R
d
R
kðxÞdx 6¼ 0: Then spanfkð
xa
b
Þ : ða; bÞ2R
d
R
þ
g
is dense in L
p
for every p 2½1; 1Þ:
Lemmas 2.1 and 2.2 show that feedforward neural net-
works with additive or RBF hidden nodes can approximate
any target continuous function provided that the hidden
node parameters ða
i
; b
i
Þ are tuned properly and appropriate
values are given. Lemmas 2.1 and 2.2 only show the uni-
versal approximation capability of feedforward neural
networks with additive or RBF hidden nodes, however,
how to find the suitable hidden node parameters ða
i
; b
i
Þ
remains open, and many tuning based learning algorithms
have been suggested in the past. Huang et al. [7] proved
that given any bounded nonconstant piecewise continuous
activation function g : R ! R for additive nodes or inte-
grable piecewise continuous activation function g : R ! R
(and
R
R
gðxÞdx 6¼ 0) for RBF nodes, the hidden layer of
such SLFN need not be tuned, in fact, all the hidden nodes
can be randomly generated. SLFNs with randomly gener-
ated hidden nodes can universally approximate any target
functions. Let e
L
: f - f
L
denote the residual error function
110 Int. J. Mach. Learn. & Cyber. (2011) 2:107–122
123