An Interpretation of Forward-Propagation and Back-Propagation of DNN 5
(b) fp-DNN
(c) bp-DNN
(a) DNN
Forward
Back-propagaƟon
Fig. 1. (a) The normal DNN training procedure contains two steps, forward(blue) and
backward(red), which forms two network sharing weights. (b) The network fp-DNN rep-
resents the forward pass, extracting features (c) The network bp-DNN has the inverted
structure of fp-DNN, but sharing the same parameters, which is for transporting the
gradients (or label information) from top to the bottom. (Color figure online)
2 Formulation of Deep Neural Networks
Classification is a basic task for Machine Learning. In this paper, we use DNN
to model the classification task and analyze how DNN is trained. We assume
a classification task with C classes, with a training data set {x
i
, y
i
}
N
i=1
that
contains N training samples. Where x ∈ R
S
is the input signal and y ∈{0, 1}
C
is the class label of x, with y
c
=1ifx belongs to the cth class and otherwise
y
i
=0,i = c. The classification task for this data set is to train a DNN to
predict the conditional distribution p(y|x)=f
Θ
(x), where f
Θ
(x) is the function
of DNN. We denote p =[p
0
,p
1
,...,p
C
]
T
∈ R
C
as the output of f
Θ
(x)for
convenience, with p
i
= p(y
i
|x).
To solve this classification task, we construct a model of deep neural network
with L hidden layers, and formulate it as [1], (Fig. 1(a))
DNN =
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
Θ
(x, y)=
C
i
y
i
log p
i
p
i
=
e
z
L,i
C
j
e
z
L,j
z
1
= W
1
x + b
1
z
l
= W
l
σ(z
l−1
)+b
l
, 2 ≤ l ≤ L
(1)
where W
l
∈ R
C
l
×C
l−1
and b
l
∈ R
C
l
is the parameters of the lth layer of DNN,
and
Θ
(x, y) is the softmax loss function. Θ is all the parameters of DNN.
z
L
=[z
L,1
,z
L,2
,...,z
L,C
]
T
is the linear output of DNN. z
l
∈ R
C
l
is the linear
output of the lth hidden layer, and p =[p
1
,p
2
, ··· ,p
C
]
T
is the final prediction
of this DNN.
We define two network structures corresponding to the training process with
forward and back-propagation,
fp-DNN =
⎧
⎪
⎨
⎪
⎩
z
0
= x
z
1
= W
1
z
0
+ b
1
z
l
= W
l
σ(z
l−1
)+b
l
, 2 ≤ l ≤ L
(2)