2.1 ELM
For N arbitrary distinct training sample fðx
i
; t
i
Þ
N
i¼1
g, where
x
i
¼ x
i1
; x
i2
; ...; x
in
½
T
2 R
n
and t
i
¼ t
i1
; t
i2
; ...; t
im
½
T
2 R
m
,
the corresponding output function of ELM with L hidden
neurons and activation function g(•) are mathematically
modeled as (1)[23].
t
i
¼
X
L
j¼1
b
j
gðw
j
x
i
þ b
j
Þ; ð1Þ
where w
j
= [w
j1
, w
j2
,…,w
jn
]
T
(j = 1, 2,…,L) is the weight
vector connecting the j-th hidden neuron and the input
neurons, b
j
= [b
j1
, b
j2
,…,b
jm
]
T
is the weight vector con-
necting the j-th hidden neuron and the output neurons, and
b
j
is the threshol d of the j-th hidden neuron. In addition,
w
j
x
i
denotes the inner product of w
j
and x
i
.
The above N equations can be written compactly as:
Hb ¼ T; ð2Þ
where
H ¼
hðx
1
Þ
.
.
.
hðx
N
Þ
2
6
4
3
7
5
NL
¼
gðw
1
x
1
þ b
1
Þ gðw
L
x
1
þ b
L
Þ
.
.
.
.
.
.
gðw
1
x
N
þ b
1
Þ gðw
L
x
N
þ b
L
Þ
2
6
4
3
7
5
NL
;
b ¼
b
T
1
.
.
.
b
T
L
2
6
4
3
7
5
Lm
; and T ¼
t
T
1
.
.
.
t
T
N
2
6
4
3
7
5
Nm
:
Here, H is called the hidden layer output matrix of the
SLFN, and the j-th column of H is the j-th hidden node
output with respect to the inputs x
1
, x
2
,…,x
N
where j = 1,
2,…,L. In addition, h(•) is called the hidden layer feature
mapping. The i-th row of H, i.e., h(x
i
), is the hidden layer
feature mapping with respect to the i-th input x
i
, where
i = 1, 2,…,N.
According to the analysis in [24], unlike the most
common understanding that all the parameters of SLFN
need to be adjusted, the input weights w
j
and the first
hidden layer biases b
j
of ELM are not necessarily tuned,
and they can be given randomly. Moreover, the orthogonal
projection method can be efficiently used in ELM: H
= (H
T
H)
-1
H
T
if H
T
H is nonsingular, where H
is the
Moore–Penrose generalized inverse of H. Actually, the
matrix H maps the data x
i
from the input space to the
hidden-layer feature space, and the feature mapping matrix
H is irrelevant to the target t
i
.
Therefore, the solution of b is:
b ¼ H
y
T ¼ðH
T
HÞ
1
H
T
T: ð3Þ
Because the hidden layer matrix H remains unchanged
actually once random values have been assigned at the
beginning of learning. Then, the above equation can be
viewed as a linear system, in which the training of SLFN
can be achieved by solving this linear system. Trainin g the
SLFN is simply equivalent to finding a least square solu-
tion b of this liner system. And the minimum norm least
square solution of (3) is unique.
2.2 OSELM
In the ELM, all the samples need to be handled before
being trained. However, it is difficult to obtain the whole
samples only the once in some practical applications. Then,
OSELM was proposed to deal with this issue [16]. After
dividing the matrix into several parts for training in
OSELM, it is effective to improve the computational
efforts and the learning performance. Let X
*
be the newly
incremental training data. The effect of incremental data is
influenced by the correction Db, which modifies the his-
torical model b
0
to form a new model b
*
with the following
equation.
b
¼ b
0
þ DbðX
Þ: ð4Þ
In [16], a solution is provided to this model. Given a
chunk of initial training data set @
0
¼fðx
i
; t
i
Þ
N
0
i¼1
gðN
0
LÞ,
under the ELM scheme, we can find:
b
0
¼ K
1
0
H
T
0
T
0
; ð5Þ
where K
0
¼ H
T
0
H
0
and H
0
¼
hðx
1
Þ
.
.
.
hðx
N
0
Þ
2
6
4
3
7
5
N
0
L
; T
0
¼
t
T
1
.
.
.
t
T
N
0
2
6
4
3
7
5
N
0
m
, hðx
1
Þ¼½gðw
1
x
1
þ b
1
Þ; ...; gðw
L
x
1
þ b
L
Þ
1L
;
...; hðx
N
0
Þ¼½gðw
1
x
N
0
þ b
1
Þ; ...; gðw
L
x
N
0
þ b
L
Þ
1L
:
Suppose that we are given another chunk of data set
@
1
¼fðx
i
; t
i
Þ
N
0
þN
1
i¼N
0
þ1
g, where N
1
denotes the number of new
samples in this data set. Considering both training data sets
@
0
and @
1
, the output weight b
1
becomes [16]:
b
1
¼ K
1
1
H
0
H
1
T
T
0
T
1
; ð6Þ
where
K
1
¼
H
0
H
1
T
H
0
H
1
¼
H
T
0
H
T
1
H
0
H
1
¼ K
0
þ H
T
1
H
1
;
Int. J. Mach. Learn. & Cyber.
123