2.2 R-ELM
ELM has attracted many attentions for its extremely fast
training speed and good generalization performance. But it
is still based on empirical risk minimization principle [see
Eq. (6)] and tends to generate over-fitting models. Conse-
quently, the trained ELM would behave very differently if
test data change but slightly away from the training data,
and it will become more serious when the training set
contains corrupted data such as outliers.
According to the statistical learning theory, a model
with good generalization ability should consider not only
the empirical risk but also the structural risk and pursue a
best tradeoff between the two risks. Based on this idea, a
regularized ELM [24, 25] is proposed to seek b that min-
imizes the following cost function:
J
R
bðÞ¼ Hb T
kk
2
þk b
kk
2
; ð9Þ
where Hb Tkk
2
is the sum of squared training errors
which can be regarded as empirical risk, b
kk
2
is the
square of norm of the network output weights vector
which represents structural risk, and k is a positive real
value called the regularization parameter to balance the
two risks.
The cost function is minimized by differentiating (9)
with respect to b and setting the results to zero, this yields
the following regularization normal equation:
H
T
HþkI
b ¼ H
T
T; ð10Þ
where I is an identity matrix with the same dimensions as
H
T
H. The estimator of b from Eq. (10) is given by
^
b ¼ H
T
H þ kI
1
H
T
T: ð11Þ
Compared with ELM, the R-ELM replaces the LS
solution [Eq. (8)] with the generalized ridge regression
estimator [Eq. (11)], which can provide better stability and
generalization ability for noisy data. Moreover, the added
regularization item also makes the correlation matrix H
T
H
nonsingular and then the matrix inversion method can be
applied directly. A more complete analysis of the R-ELM
can be found in [26], where the authors extend such study
to generalized SLFNs with different feature mappings as
well as kernels.
2.3 OSELM and R-OSELM
As a sequential version of the batch ELM algorithm, the
OSELM adopts a recursive way to solve the LS solution,
and which may also encounter the ill-posed problems due
to the unavoidable presence of noise or outliers. Similar to
R-ELM, an improvement of OSELM called regularized
OSELM [27] is proposed to improve the stability of
OSELM while maintaining the same sequential learning
ability as OSELM.
The R-OSELM algorithm uses the same cost function
[Eq. (9)] as the R-ELM and aims to seek the optimal reg-
ularization solution in a sequential learning fashion. The
learning process of R-OSELM consists of an initialization
phase and a following sequential learning phase as the
same as OSELM, just adding a regularization item to sta-
bilize the initial output weights. The one-by-one
R-OSELM is summarized below.
In initialization phase, given an initial training set
X
k1
¼fðx
j
; t
j
Þjj ¼ 1; ...; k 1g, according to Eq. (11),
the initial output weights are given by
b
k1
¼ P
k1
H
T
k1
T
k1
ð12Þ
where P
k1
¼ H
T
k1
H
k1
þkI
1
, H
k1
¼ h
T
1
h
T
2
h
T
k1
T
and T
k1
¼ t
1
t
2
t
k1
½
T
.
In the sequential learning phase, the recursive least-
squares algorithm is used to constantly update the output
weights. Suppose now that we receive another sample
ðx
k
; t
k
Þ, the corresponding partial hidden layer output matrix
is calculated as h
k
¼ Gða
1
; b
1
; x
k
Þ½ Gða
n
; b
n
; x
k
Þ, then
the output weights update equations are determined by
P
k
¼ P
k1
P
k1
h
T
k
h
k
P
k1
1 þ h
k
P
k1
h
T
k
;
b
k
¼ b
k1
þ P
k
h
T
k
t
k
h
k
b
k1
ðÞ: ð13Þ
As seen from Eq. (13), the output weights are updated
recursively only based on the newly arrived data, which is
discarded immediately as soon as it has been learnt. The
above one-by-one R-OSELM algorithm can be easily
extended to chunk-by-chunk type. In addition, if the reg-
ularization parameter k in the initial solution [Eq. (12)]
equals zero, then R-OSELM becomes the original
OSELM.
3 Proposed M-OSELM
In this section, we first present a novel M-estimator-based
learning model, next a recursive solution that solves the
M-estimator model is derived and concomitantly a
sequential parameter estimation approach is introduced to
estimate the threshold parameter of the M-estimator func-
tion for online outlier detection, finally a robust online
sequential learning algorithm named M-OSELM is
proposed.
3.1 M-estimator-based learning model
As described in Sect. 2, the learning rules of the ELM and
OSELM are based on the LS criterion, which minimizes
4096 Neural Comput & Applic (2017) 28:4093–4110
123