significantly. However, such approach not only reduces
the effectiveness of features due to achieving many more
redundant features, but also further deteriorates the per-
formance of conventional SOM algorithm because more
iterations are required.
2.2 Dynamic neighborhood radius and learning rate
Within the t-th training epoch in the conventional SOM,
both values of a(t) and r(t) are fixed for different winners
in competition, namely all winners are treated equally
within a training epoch. While in our algorithm, the
learning rate and neighborhood size for a winner of com-
petition are dynamic, which can be calculated by
a
c
tðÞ¼a tðÞ1 e exp K
c
d
c;j
ð6Þ
and
r
c
tðÞ¼r tðÞ1 e exp K
c
d
c;j
; ð7Þ
where a(t) and r(t) can be calculated by Eqs. (4) and (5),
respectively, e ðe 2½0; 1Þ and K
c
are user-defined constant
values used to tune the learning rate and neighborhood
radius for a winner. d
c,j
is the distance between x
j
and w
c
,
which can be represented as
d
c;j
¼
w
i
x
j
2
d
: ð8Þ
It can be seen from Eqs. (6) and (7) that the learning
rates [a
c
(t)] and neighborhood radiuses [r
c
(t)] for winners
of competition are dynamic within each training epoch,
which partly depends on the value of d
c,j
.Ifd
c,j
? 0,
then r
c
(t) & r(t)(1 - e) and a
c
(t) & a(t)(1 - e), a
weakened learning for the input x
j
occurs. On the
contrary, if d
c
? 1 and K
c
is a big constant, then
r
c
(t) & r(t) and a
c
(t) & a(t), a learning for x
j
will be
normal. Compared with the weakened learning, within a
training epoch, if a normal learning occurs, the winner of
competition can be given a larger neighborhood radius
and a greater learning rate.
Being of greater ratios in the training data, features of
the principal components in the training data will be ahead
of non-principal components to be trained well. Thus, the
normal learning is mainly triggered by non-principal
components in the training data. Compared with the con-
ventional SOM, features of non-principal components are
given more chance to compete, and thereby better repre-
sentations of their features can be achieved by the proposed
algorithm. What is more, with the decrease of training
error, the number of neurons needed to update will also
decrease, which is benefit to improve the performance of
the proposed algorithm.
Figure 1 demonstrates the different competitive behav-
iors of both algorithms. In Fig. 1a, coordinates (x
i
,y
i
)of
the eleven red points are used to train both networks.
Figure 1b presents a possible distribution of points (x
i
,y
i
)
trained by the conventional SOM. It can be seen that there
exists a large distance between the sixth sample point
(marked with a red circle in Fig. 1b) and the closest trained
point (the sixth blue point in Fig. 1b). As to our algorithm,
when it learns the position vector of the sixth sample point
(red circle in Fig. 1b), due to the large training error, the
learning will be normal. Thus, a possible intermediate
result trained by our algorithm is demonstrated in Fig. 1c.
When the training finishes, a possible distribution of points
can be demonstrated by Fig. 1d. From Fig. 1d, it can be
seen that the maximum distance or training error is greatly
less than that achieved by the conventional SOM.
2.3 New way to update weight vectors of neurons
To reduce redundancy of features extracted from the
principal components and to avoid the problem of over-
fitting, we present a new way to update weight vectors of
neurons.
As shown in Fig. 2a, in the traditional way, not only the
winner (marked with red ball) but also its neighbors
(marked with blue balls) will be directly affected by the
training vector (marked with purple triangle). As to the
new way, as illustrated in Fig. 2b, each neighbor of a
winner will only be directly affected by the neuron closest
to both the winner and itself (not including itself) in the
lattice.
The weight vector (w
c
) of the current winner c can be
directly updated by
w
c
t þ 1ðÞ¼w
c
tðÞþa
c
tðÞx
j
w
c
tðÞ
: ð9Þ
For each neighbor of the winner c, its learning rate
[a
i
(t)] is determined by
a
i
tðÞ¼a tðÞ1 e exp K
c
d
i;t
; ð10Þ
where d
i,t
is the Euclidean distance between both weight
vectors of the neighbor and the training data.
Then, new weight vector for each neighbor of the win-
ner can be represented as
w
i
t þ 1ðÞ¼w
i
tðÞþa
i
tðÞw
t
tðÞw
c
tðÞðÞ; ð11Þ
where w
t
(t) is the weight vector of the training data, and it
should be updated before its output. By tuning the learning
rate a
i
(t), the distance between weight vectors of the
training data and its output will be neither too close nor too
far. Thus, features extracted from the principal components
in the training data by our algorithm will be of lower
redundancy.
Neural Comput & Applic (2014) 24:1759–1770 1761
123