S. Li et al. / Expert Systems With Applications 91 (2018) 63–77 65
the convex optimization problem shown in Eq. (2) .
minimize
1
2
w
2
+ C
l
i =1
ξ
i
+ ξ
∗
i
(2)
subject to
⎧
⎨
⎩
w,
(
x
)
+ b − y
i
≤ ε + ξ
i
y
i
− w,
(
x
)
− b ≤ ε + ξ
∗
i
ξ
i
, ξ
∗
i
≥ 0
(2)
where C is a constant known as the penalty factor, and the ξ
i
, ξ
∗
i
term show the amount of difference between the estimated value
and the target value.
It turns out that the optimization problem can be solved more
easily in its dual formulation. Substituting the saddle point con-
dition and using K( x
i
, x
j
) =
T
( x
i
)( x
j
) instead of ( · ) explicitly,
the kernel version of dual optimization problem is yielded by elim-
inating the dual variables η
i
, η
∗
i
in Eq. (3) . The K ( x, x
) is called the
kernel function satisfying the Mercer condition ( Mercer, 1909 ).
minimize
1
2
l
i, j=1
α
∗
i
− α
i
α
∗
j
− α
j
K
x
i
· x
j
+ ε
l
i =1
α
∗
i
+ α
i
−
l
i =1
y
i
α
∗
i
− α
i
subject to
⎧
⎨
⎩
l
i =1
α
∗
i
− α
i
= 0 ,
0 ≤ α
i
, α
∗
i
≤ C
(3)
After solving the dual optimization problem above, w can be
solved explicitly in Eq. (4) where α
i
and α
∗
i
are Lagrange mul-
tipliers and the samples with positive and non-zero α
i
and α
∗
i
are called the support vectors (SVs). By exploiting the so called
Karush–Kuhn–Tucker (KKT) conditions which describe determine
necessary and sufficient conditions for a global optimum, the prod-
uct between dual variables and constraints has to vanish at the
optimal solution and the parameter b can be computed in Eq. (5) .
Then the function f ( x ) can be rewritten in the so-called support
vector expansion shown in Eq. (6 ). In a sense, the complexity of a
function is independent of the dimensionality of the input space,
and depends only on the number of SVs.
w =
l
i =1
(α
∗
i
− α
i
) x
i
(4)
b = y
j
−
w, (x )
− ε for 0 ≤ α
i
≤ C
b = y
j
−
w, (x )
+ ε for 0 ≤ α
∗
i
≤ C (5)
f (x ) =
l
i =1
( α
i
− α
∗
i
) K( x
i
, x ) + b (6)
The choice of kernel function for specific data patterns, which
is another attractive question in the application of SVR, appeares
somewhat arbitrary till now. Some effort s ( Mohammadi, Shamshir-
band, Anisi, Alam, & Petkovi
´
c,
2015; Shamshirband et al., 2014 )
empirically indicate that the use of the gaussian RBF kernel is su-
perior to other kernel functions because of its accessibility to im-
plement and powerful mapping capability. Therefore, the gaussian
RBF kernel function K(x, x
) = exp (−
x −x
2
2 σ
2
) was specified in this
study. There are two parameters involved in SVR algorithm. The
parameter C controls the trade-off between the complexity of the
function and the frequency in which errors are allowed. The pa-
rameter σ affects the mapping transformation of the input data to
the feature space and controls the complexity of the model. Thus,
it is important to select suitable parameters, and the value of pa-
rameter σ should be selected more carefully than C as reported in
Schölkopf and Smola (2002) .
3.2. Sine cosine algorithm
Sine cosine algorithm can be able to explore different regions
of a search space, avoid local optima, converge towards the global
optimum, and exploit promising regions of a search space during
optimization effectively as stated in Mirjalili (2016) . Based on sine
and cosine functions, SCA starts to search the best global solution
with a set of random candidate solutions and updates their posi-
tions outwards or towards the best solution. Different regions of
the search space are explored when the sine and cosine functions
return a value greater than 1 or less than −1 . Promising regions
of the search space is exploited when sine and cosine return value
between −1 and 1.
In SCA, the dimension of the search space is determined by the
number of parameters that are needed to optimize. The number
of search agents is determined by the user. The positions ( X
i
) of
the current solution are randomly initialized in the search space
and then adjusted to the old positions as in Eq. (7) to make sure
that the solutions always update their positions around the best
solution obtained.
X
t+1
i
=
X
t
i
+ r
1
sin ( r
2
)
r
3
P
t
i
− X
t
i
, r
4
< 0 . 5
X
t
i
+ r
1
cos ( r
2
)
r
3
P
t
i
− X
t
i
, r
4
≥ 0 . 5
(7)
where X
t
i
is the position of the current solution in i th dimension
at t th iteration, and P
i
is position of the destination point in i th
dimension.
Four random and adaptive variables also are integrated to fa-
cilitate divergence and convergence of the search agents. In order
to balance exploration and exploitation to find the promising re-
gions of the search space and eventually converge to the global
optimum, the range of sine and cosine is changed adaptively by
defining the parameter r
1
in Eq. (8) . So, the parameter r
1
dictates
the next positions region (or movement direction) which could be
either in the space between the solution and destination or out-
side it. The cyclic pattern of sine and cosine function allows a so-
lution to be repositioned around another solution. This can guar-
antee exploitation of the space defined between two solutions. By
changing the range of the sine and cosine functions, the solutions
should be able to explore the outside search space between their
corresponding destinations. The random location either inside or
outside is achieved by defining a random number for r
2
in [0,
2 π] in Eq. (7) . So, the random parameter r
2
defines how far the
movement should be towards or outwards the destination. There-
fore, this mechanism guarantees exploration and exploitation of
the search space respectively. The random parameter r
3
brings a
random weight for the destination in order to stochastically em-
phasize ( r
3
> 1) or deemphasize ( r
3
< 1) the effect of destination
in defining the distance. Finally, the random parameter r
4
ranging
from 0 to 1 equally switches between the sine and cosine compo-
nents.
r
1
= a − t
a
T
(8)
where t is the current iteration, T is the maximum number of iter-
ations, and a is a constant.
In addition, the pseudo code of the SCA algorithm ( Algorithm 1 )
is presented below. The SCA algorithm starts the optimization pro-
cess by a set of random solutions. The algorithm then saves the
best solutions obtained so far, assigns it as the destination point,
and updates other solutions with respect to it. Meanwhile, the
ranges of sine and cosine functions are updated to emphasize ex-
ploitation of the search space as the iteration counter increases.
The SCA algorithm terminates the optimization process when the
iteration counter goes higher than the maximum number of itera-
tions by default.
As you can see, SCA is more efficient than many other
population-based optimization algorithm to determine the global