where r
l
i
ðtÞ denotes the firing rate of neuron i at layer l and
r
max
denotes the maximum firing rate that is determined by
the time step size. a
1
i
is the activation value of ANN neuron i
at the first layer, V
1
i
ðtÞ is the membrane potential of the corre-
sponding spiking neuron, and # is the neuronal firing thresh-
old. M
l1
is the total number of neurons in layer l 1 and b
l
i
is the bias term of ANN neuron i at layer l. Ideally, the firing
rate of spiking neurons should be proportional to the activa-
tion value of their ANN counterparts as per the first term of
Eq. (1). While the surplus membrane potential that has not
been discharged by the end of simulation will cause an
approximation error as shown by the second term of Eq. (1),
which can be counteracted with a large firing threshold or a
large encoding time window. Since increasing the firing
threshold will inevitably prolong the evidence accumulation
time, a proper firing threshold that can prevent spiking neu-
rons from either under- or over-activating is usually pre-
ferred and the encoding time window is extended to
minimize such a firing rate approximation error.
Besides, this approximation error accumulates gradually
while propagating over layers as shown in Eq. (2), thereby a
further extension of the encoding time window is required
to compensate. As such, a few thousand time steps are typi-
cally required to achieve a competitive accuracy for deep
SNNs with more than 10 layers [28], [29]. From these formu-
lations, it is clear that to approximate the continuous input-
output representation of ANNs with the firing rate of spik-
ing neurons will inevitably lead to the accuracy and latency
trade-off. To overcome this issue, as will be introduced in
the following sections, we propose a novel conversion
method that is grounded on the discrete neural representa-
tion, whereby the spike count, upper bounded by the
encoding time window size, is taken to approximate the dis-
crete input-output representation of ANNs. To make effi-
cient use of the spike count for information representation,
we propose a novel firing threshold determination strategy
such that rapid and efficient pattern recognition can be
achieved with SNNs. To counteract the conversion errors
and hence ensure high accuracies in pattern recognition
tasks, a layer-wise learning method is further proposed to
fine-tune the network.
3RETHINKING ANN-TO-SNN CONVERSION
Over the years, many spiking neuron models are developed
to describe the rich dynamical behavior of biological neurons.
Most of them, however, are too complex for real-world pat-
tern recognition tasks. As discussed in Section 2, for computa-
tional simplicity and ease of conversion, the IF neuron model
is commonly used in ANN-to-SNN conversion works [26],
[27], [28]. Although this simplified spiking neuron model
does not emulate the rich sub-threshold dynamics of biologi-
cal neurons, it preserves attractive properties of discrete and
sparse communication, therefore, allows for efficient hard-
ware implementation. In this section, we reinvestigate the
approximation of input-output representation between a
ReLU ANN neuron and an integrate-and-fire spiking neuron.
3.1 Spiking Neuron Versus ANN Neuron
Let us consider a discrete-time simulation of spiking neu-
rons with an encoding time window of N
s
that determines
the inference speed of an SNN. At each time step t, the
incoming spikes to the neuron i at layer l are transduced
into synaptic current z
l
i
½t according to
z
l
i
½t¼
X
j
w
l1
ij
s
l1
j
½tþb
l
i
; (3)
where s
l1
j
½t indicates the occurrence of an input spike at
time step t, and w
l1
ij
is the synaptic weight between the pre-
synaptic neuron j and the post-synaptic neuron i at layer l.
b
l
i
can be interpreted as a constant injecting current.
The synaptic current z
l
i
½t is further integrated into the
membrane potential V
l
i
½t as per Eq. (4). Without loss of gen-
erality, a unitary membrane resistance is assumed in this
work. The membrane potential is reset by subtracting the
firing threshold after each firing as described by the last
term of Eq. (4).
V
l
i
½t¼V
l
i
½t 1þz
l
i
½t#
l
s
l
i
½t 1: (4)
An output spike is generated whenever the V
l
i
½t rises
above the firing threshold #
l
(determined layer-wise) a s
follows
s
l
i
½t¼QðV
l
i
½t#
l
Þ with QðxÞ¼
1;ifx 0
0; otherwise
:
(5)
The spike train s
l
i
and spike count c
l
i
for a time window of
N
s
can thus be determined and represented as follows
s
l
i
¼fs
l
i
½1; ...;s
l
i
½N
s
g
c
l
i
¼
X
N
s
t¼1
s
l
i
½t:
(6)
For non-spiking ANN neurons, let us describe the neuro-
nal function of neuron i at layer l as
a
l
i
¼ f
X
j
w
l1
ij
x
l1
j
þ b
l
i
!
; (7)
which has w
l1
ij
and b
l
i
as the weight and bias. x
l1
j
and a
l
i
denote the input and output of the ANN neuron. fðÞ
denotes the activation function, which we use the ReLU in
this work. For ANN-to-SNN conversion, an ANN with the
ReLU neurons is first trained, that is called pre-training,
before the conversion.
3.2 Neural Discretization Versus Activation
Quantization
In the conventional ANN-to-SNN conversion studies, the
firing rate of spiking neurons is usually taken to approxi-
mate the continuous input-output representation of the pre-
trained ANN. As discussed in Section 2, a spiking neuron
takes a notoriously long time window to reliably approxi-
mate a continuous value. Recent studies, however, suggest
such a continuous neural representation may not be neces-
sary for ANNs [34]. In fact, there could be little impact on
the network performance when the activation value of
ANN neurons are properly quantized to a low-precision
WU ET AL.: PROGRESSIVE TANDEM LEARNING FOR PATTERN RECOGNITION WITH DEEP SPIKING NEURAL NETWORKS 7827