Multi-Layer Unsupervised Learning in a Spiking
Convolutional Neural Network
Amirhossein Tavanaei
The Center for Advanced Computer Studies
University of Louisiana at Lafayette, LA 70504, USA
Email: tavanaei@louisiana.edu
Anthony S. Maida
The Center for Advanced Computer Studies
University of Louisiana at Lafayette, LA 70504, USA
Email: maida@louisiana.edu
Abstract—Spiking neural networks (SNNs) have advantages
over traditional, non-spiking networks with respect to bio-
realism, potential for low-power hardware implementations, and
theoretical computing power. However, in practice, spiking net-
works with multi-layer learning have proven difficult to train.
This paper explores a novel, bio-inspired spiking convolutional
neural network (CNN) that is trained in a greedy, layer-wise
fashion. The spiking CNN consists of a convolutional/pooling
layer followed by a feature discovery layer, both of which
undergo bio-inspired learning. Kernels for the convolutional layer
are trained using a sparse, spiking auto-encoder representing
primary visual features. The feature discovery layer uses a
probabilistic spike-timing-dependent plasticity (STDP) learning
rule. This layer represents complex visual features using WTA-
thresholded, leaky, integrate-and-fire (LIF) neurons. The new
model is evaluated on the MNIST digit dataset using clean and
noisy images. Intermediate results show that the convolutional
layer is stack-admissible, enabling it to support a multi-layer
learning architecture. The recognition performance for clean
images is above 98%. This performance is accounted for by
the independent and informative visual features extracted in
a hierarchy of convolutional and feature discovery layers. The
performance loss for recognizing the noisy images is in the range
0.1% to 8.5%. This level of performance loss indicates that the
network is robust to additive noise.
I. INTRODUCTION
Hierarchical feature discovery using convolutional neural
networks (CNNs) has attracted much recent interest in ma-
chine learning and computer vision [1], [2]. CNNs have
outperformed previous models in several areas such as im-
age [3] and speech recognition [4]. Most CNNs are trained
by backpropagation, which cannot be computed locally, and
thus seems biologically implausible. This paper addresses
the challenge of training a spiking CNN with biologically
plausible local learning at the synapses (weights). In contrast
to conventional CNNs, spiking CNNs are amenable to low-
power hardware implementations.
One challenge to using spiking CNNs is that they are
difficult to train. To illustrate this difficulty, we consider
some designs used in low-power neuromorphic hardware. To
avoid the difficulties of directly training a spiking CNN, these
networks are usually trained as a conventional (non-spiking)
CNN and then, after training, are converted to a spiking
network [5], [6], [7]. For instance, Cao et al. (2014) developed
a spiking CNN by converting an already trained, rate-based
CNN to a spike-based implementation [6]. Diehl et al. (2015)
extended the conversion method introduced in [6] to reduce
the performance loss during the conversion using a weight
adjustment approach [7]. However, a number of spiking CNNs
[8], [9], [10], [11] trained by spike-timing-dependent plasticity
(STDP) [12] currently exist. One of their limitations is they
utilize only one trainable layer of unsupervised learning.
The network of Masquelier and Thorpe (2007), which is
possibly the earliest spiking CNN, has this property [8]. It
consists of a convolutional/pooling layer followed by a feature
discovery layer and a classification layer. Only the feature
discovery layer uses unsupervised learning. Wysoski et al.
(2008) used a similar design which extracted initial features
using a difference of Gaussian (DoG) filters in different orien-
tations [9]. This network also had only one trainable layer for
unsupervised learning. Furthermore, neither of these networks
trained the earlier feature extraction layer, but instead used
handcrafted Gabor or DoG filters. Recent extensions of [8]
providing multi-layer STDP-based networks still utilize the
handcrafted filters for primary visual feature extraction [13],
[14]. Recent work of [15] developed a backpropagation-trained
spiking auto-encoder for a multi-layer spiking CNN. However,
the backpropagation algorithm is not biologically plausible.
Our interest is in developing spiking CNNs which directly
use multi-layer learning such that the convolutional (feature
extraction) and feature discovery layers are trained locally
using layer-wise, unsupervised learning.
Our first contribution replaces handcrafted convolutional
filters with learned detectors acquired by a biologically plau-
sible, state-of-the-art, sparse coding model [16]. The acquired
detectors represent the model receptive fields whose shapes
resemble those found in primate visual cortex (area V1).
Sparse representations, where each input state is coded by
a few active units, are a compromise between extremely
localized representations and fully distributed representations,
while being easy to analyze [17]. The construction of sparse
representations that resemble V1 receptive fields has been
achieved by different methods such as simple Hebbian units
connected by anti-Hebbian feedback synapses [17], minimiz-
ing reconstruction error combined with a sparsity regular-
izer [18], and independent components analysis (ICA) [19].
In terms of spike code formation, recent work has used
spiking networks to study the acquisition of visual sparse
code representations comparable to visual features found in
978-1-5090-6182-2/17/$31.00 ©2017 IEEE 2023