Mathematical Problems in Engineering
Inequality Optimization Constraints Based ELM. With the
parameter setting of
1
=2,
2
=1, =2, =1,and
inequality constraints, the general optimization formula can
be written as (), which is common in binary classication.
Since this form is applied wildly and has good sparsity, we
use it as the base model of our extension for classication:
min
𝛽
(𝑖)
,𝜉
𝑖
1
2
𝛽
(𝑖)
2
+
𝑛
𝑖=1
𝑖
,
s.t. y
𝑖
∗x
i
⋅𝛽
(𝑖)
𝑇
≥1−
𝑖
,
𝑖
≥0.
()
Applying KKT conditions, () can be transformed into ();
thenitcanbesolvedindualspace:
max
𝛼
𝑛
𝑖=1
𝑖
−
1
2
𝑛
𝑖=1
𝑛
𝑗=1
𝑖
𝑗
𝑖
𝑗
𝑖
,
𝑗
,
s.t.0≤
𝑖
≤, =1,2,...,.
()
Equality Optimization Constraints Based ELM. With the
parameter setting of
1
=2,
2
=1, =2, =1,and
equality constraint, the general ELM optimization formula
is equivalent to () which can be used in regression and
classication:
min
𝛽
(𝑖)
,𝜉
𝑖
1
2
𝛽
(𝑖)
2
+
𝑛
𝑖=1
2
𝑖
,
s.t.x
i
⋅𝛽
(𝑖)
𝑇
=y
𝑖
−
𝑖
, =1,2,...,.
()
e corresponding KKT optimal conditions are shown in
𝛽 =H,
𝑖
=
𝑖
, =1,2,...,,
h x
𝑖
𝛽 −y
𝑇
𝑖
+
𝑇
𝑖
=0, =1,2,...,.
()
Further, the nal output is given in
f
(
x
)
=h
(
x
)
𝛽 =h
(
x
)
H
𝑇
I
+HH
𝑇
−1
T.
()
2.2. ELM for -Insensitive Regression. For regression, ELM
provides general model for standard setting. It achieves
better predictive accuracy than traditional SLFNs []. In
addition, many variants and extensions of ELM regression
algorithms have been proposed. Inspired by Vapnik’s epsilon
insensitive loss function, []proposed-insensitive ELM. Its
optimization formula is as
min
𝛽
1
2
×
𝛽
2
+
2
×
(
x
)
𝛽 −y
2
𝜖
,
()
where is insensitive factor and the error loss function is
calculated by
(
x
)
𝛽 −
2
𝜖
=
𝑛
𝑖=1
𝑖
−
𝑖
2
𝜖
,
with
𝑖
−
𝑖
𝜖
=max 0,
𝑖
−
𝑖
−.
()
Compared with conventional ELM regression, ELM with
-insensitive loss function uses margin to measure the
empirical risk. It controls the sparsity of the solution []and
is less sensitive to dierent levels of noise []. In this paper,
we extend ELM regression algorithm based on this variant.
3. Missing Data Problem in ELM Learning
3.1. Missing Data Problem. Nowadays, with ever-increasing
datavelocityandvolume,missingdatabecomesacommon
phenomenon. Generally, there are two missing patterns, that
is, missing feature and missing label. In this paper, we focus
on the issue of missing feature.
From the causes of missing data, there are two cir-
cumstances. In the rst circumstance, the missing features
exist but their values are unobserved for the reason that
information is lost or some features are too costly to be
acquired []. Examples of such case can be found in many
domains. Sensors in a remote sensor network may be dam-
aged and fail to collect data intermittently. Certain regions
of a gene micro array may fail to yield measurements of the
underlying gene expressions due to scratches, ngerprints, or
dust []. Second is inherently missing. In this circumstance,
dierent samples inherently contain dierent features. For
instance, in packed malware identication, instances contain
some unreasonable values. In the web-page task, one useful
featureofagivenpagemaybethemostcommontopicof
other sites that point to it. If this particular page has no
such parents, however, the feature is null, and should be
considered structurally missing []. Obviously, imputation
for this circumstance is meaningless.
3.2. Traditional Approaches for Missing Data Learning. Gen-
erally, there are three approaches for dealing with missing
features in machine learning. e rst approach is omitting,
which includes sample deletion and feature ltering. Sample
deletion simply omits the samples containing missing fea-
tures and applies standard learning algorithms in the remain-
ing samples. An example is shown in Figure ; 2
with two missing features is deleted. Feature ltering omits
the features that are missing in most samples. Figure
interprets this approach. Obviously, the advantage of omitting
based approaches is simple and computationally inexpensive.
Notably, the key point of omitting is keeping as much as pos-
sible useful information while omitting. But it is dicult to do
that. Both of them inevitably omit some useful information.
When there is massive information retained aer being partly
omitted, this approach can be a better choice. Otherwise,
in the situation of much useful information being omitted
while few being retained, this kind of approaches aects
learning precision seriously. Second approach is imputation.
In data preprocessing phase, missing features are lled with
most possible values []. Simple imputations ll the missing
features with some default value such as zero or average
value of other samples. Complex imputations use some
probabilistic density function or distribution function to
estimate the missing features. e computational complexity
of imputation varies with dierent estimation methods.
Imputation makes sense when the features are known to