Stochastic Weight Update in The Backpropagation Algorithm
on Feed-Forward Neural Networks
Juraj Ko
ˇ
s
ˇ
cak, Rudolf Jak
ˇ
sa, Member, IEEE and Peter Sin
ˇ
c
´
ak, Member, IEEE
Abstract—We will examine stochastic weight update in the
backpropagation algorithm on feed-forward neural networks. It
was introduced by Salvetti and Wilamowski in 1994 in order to
improve probability of convergence and speed of convergence.
However, this update method has also one another quality, its
implementation is simple for arbitrary network topology. In
stochastic weight update scenario, constant number of weights
is randomly selected and updated. This is in contrast to classical
ordered update, where always all weights are updated. We will
describe exact implementation, and present example results
on toy-task data with feed-forward neural network topology.
Stochastic weight update is suitable to replace classical ordered
update without any penalty on implementation complexity and
with good chance without penalty on quality of convergence.
I. INTRODUCTION
The Stochastic Weight Update was introduced by Salvetti
and Wilamowski in 1994 [1] in order to improve probability
of convergence and speed of convergence of the backpropa-
gation algorithm. Besides the Stochastic Weight Update, they
examined another two stochastic methods: random pattern
selection and randomized learning rate. On the XOR problem
they demonstrated significant improvement in the learning
speed and probability of convergence for every one from
these methods, especially for randomized learning rate.
The backpropagation algorithm (Werbos, 1974; Rumel-
hart, McClealland, 1986) is one of most used learning algo-
rithms today. Plain vanilla implementation and momentum
implementation (Silva, Almeida, 1990) [2] are the predomi-
nant implementations on feed-forward topologies. Multilayer
perceptron (MLP) is most used feed-forward topology, it is
a layered architecture with fully interconnected layers.
The Backpropagation Through Time (BPTT) [3] is most
used backpropagation variant for recurrent topologies. The
BPTT was originally described, and is well suited, for
recurrent topologies with full connectivity. Although the
BPTT implementation is not much more complex compare to
feed-forward vanilla backpropagation, the feed-forward time-
delay networks are preferably used for time-related problems.
Time-delay networks are of the MLP design, so layered
topology is most used topology with the backpropagation
algorithm.
Sparse topologies became recently popular with the recur-
rent echo state networks (ESN) (Jaeger, 2001) [4]. However
these are usually not based on backpropagation algorithm.
ESN networks are based on so called reservoir of randomly
Juraj Ko
ˇ
s
ˇ
cak, Rudolf Jak
ˇ
sa, and Peter Sin
ˇ
c
´
ak are with the Depart-
ment of Cybernetics and Artificial Intelligence, Technical University
of Ko
ˇ
sice, Letn
´
a 9 Ko
ˇ
sice, Slovakia (email: jurajkoscak@gmail.com,
jaksa@neuron.tuke.sk, peter.sincak@tuke.sk).
interconnected neurons, so they have fully random sparse
topology.
Another area where sparse topologies are employed are
neuro-fuzzy architectures, typical is backpropagation based
ANFIS (Jang, 1993) [5]. ANFIS-like neuro-fuzzy topologies
are characterized by layered, but not fully interconnected
structure. In contrast to randomly-connected ESN architec-
tures, topologies of ANFIS-like structures are deterministic.
The last popular group of backpropagation related topolo-
gies are neocognitron (Fukushima, 1980) [6] and LeNet
(LeCun et al., 1989) [7]. These so called convolutional neural
networks do use deep layered topologies with five or more
layers. The difference from the classical MLP network is
however the weight reuse, where several links in the network
share the same weight.
In modular neural networks several basic topologies are
connected to bigger meta-topology. Sometimes it might be
useful to process this meta-network as a single instance with
complex topology. This happen for instance with so called
actor-critic architectures for reinforcement learning, where
two or more networks are back-propagated by backpropaga-
tion algorithm in single run. Beside a couple of actor-critic
topologies most famous modular architectures are mixtures
of experts (Jacobs, Jordan, Nowlan, & Hinton, 1991) [8]
and NARA (Takagi, 1992) [9], which are general-use neural
networks.
Pruning methods and cascade correlation (Fahlman, 1991)
[10] style algorithms for incremental building of topology
can lead to pseudo-random sparse topologies too.
Although layered topologies are most frequent among
backpropagation trained neural networks, many interesting
structured or semi-random topologies are used too. Signals
propagation through structured topologies might be non-
trivial for more demanding backpropagation-group algo-
rithms like the BPTT with time unrolling signals propagation.
Stochastic Weight Update may simplify this “propagation”
task.
II. MOTIVATION
Implementation of classical ordered weight update in
backpropagation algorithm does rely on the order of weights
computation in network. In classical MLP networks it is
simple layer-by-layer computation from inputs to the output
layer. Less common, but still convenient implementation is
for fully interconnected network, like in the common BPTT
case. The necessity to update weights from every neuron to
every neuron, can actually bring us compact implementation
- we don’t need to care about any exceptions.
978-1-4244-8126-2/10/$26.00 ©2010 IEEE