an additional condition must also be imposed on the input bias,
f
ˆ
P
in
b
=
ˆ
P
in
x
∂
. (2.8)
This condition is satisfied for a feedforward neural network, but need not be satisfied for
more general learning systems. After a finite number of steps t the state vector x(t) may
converge to a fixed state x(t) =
¯
x defined by a fixed point equation
¯
x = f ( ˆw
¯
x + b) . (2.9)
For example, in a deep feedforward neural network with L layers the fixed state would be
reached after L − 1 steps, i.e. x(L − 1) =
¯
x, given that the condition on the input bias (2.8)
is satisfied. For more general systems the state may or may not converge to a fixed point
depending on the activation transformation (2.5) and initial conditions (2.4).
The final ingredient of a neural septuple is a loss function. In a feedforward neural
network the loss function is usually defined by projecting the fixed state
¯
x to the output
subspace
ˆ
P
out
¯
x ∈ V
out
and then by comparing the result with a desired output state
ˆ
P
out
x
∂
∈
V
out
. For example, one can define a loss function as a squared error of the output neurons,
H
∂
(
¯
x, b, ˆw) =
ˆ
P
out
¯
x −
ˆ
P
out
x
∂
T
ˆ
P
out
¯
x −
ˆ
P
out
x
∂
(2.10)
= (
¯
x − x
∂
)
T
ˆ
P
T
out
ˆ
P
out
(
¯
x − x
∂
)
= (
¯
x − x
∂
)
T
ˆ
P
out
(
¯
x − x
∂
) .
Since there is no error on the input neurons (2.7) we can also rewrite it as a squared error
on all boundary (i.e. input and output) neurons
H
∂
(
¯
x, b, ˆw) =
1
2
(
¯
x − x
∂
)
T
(
ˆ
P
in
+
ˆ
P
out
) (
¯
x − x
∂
) . (2.11)
For this reason, we shall refer to H
∂
as a boundary loss function.
3 Supervised vs. unsupervised
In the pervious section we defined a neural network as a neural septuple
x,
ˆ
P
in
,
ˆ
P
out
, ˆw, b, f , H
where x is a state vector of all (input, output and hidden) neurons,
ˆ
P
in
x is a state of only
input neurons,
ˆ
P
out
x is a state of only output neurons, ˆw is a weight matrix between all pairs
of neurons, b is a bias vector for all neurons, f(y) is an activation map and H(x, b, ˆw) is a
loss function. A simple example of a loss function is the boundary loss (2.11) which is known
to work very well in a supervised learning. Unfortunately, the boundary loss cannot be used
in unsupervised systems where the output subspace is empty, V
out
= ∅, and thus the bound-
ary loss is always zero, H = H
∂
= 0.
2
For this reason, in unsupervised systems (beyond
auto-encoders) we must consider other loss functions which are, perhaps, more general than
the boundary loss.
A key observation is that in equation (2.11) the boundary loss was due to a mismatch
in the output conditions or (together with input conditions) in the boundary conditions, i.e.
2
In our description an auto-encoder is viewed as a supervised system with periodic boundary conditions,
i.e. the input and output states are set equal to each other.
– 5 –