![](https://csdnimg.cn/release/download_crawler_static/86341003/bg14.jpg)
18 3 Shallow neural networks
h
1
= a[θ
10
+ θ
11
x]
h
2
= a[θ
20
+ θ
21
x]
h
3
= a[θ
30
+ θ
31
x], (3.3)
where we refer to h
1
, h
2
, and h
3
as hidden units. Second, we compute the output
by combining these hidden units with a linear function:
1
y = ϕ
0
+ ϕ
1
h
1
+ ϕ
2
h
2
+ ϕ
3
h
3
. (3.4)
Figure 3.3 shows the ow of computation that creates the function in gure 3.2a.
Each hidden unit contains a linear function θ
•0
+ θ
•0
x of the input, and that line is
clipped by the ReLU function a[•] below zero. The positions where the three lines
cross zero become the three “joints” in the nal output. The three clipped lines
are then weighted by ϕ
1
, ϕ
2
, and ϕ
3
respectively. Finally, the oset ϕ
0
is added,
which controls the overall height of the nal function.
Problems 3.2-3.8
Each linear region in gure 3.3j corresponds to a dierent activation pattern
in the hidden units. When a unit is clipped, we refer to it as inactive, and when
it is not clipped, we refer to it as active. For example, the shaded region receives
contributions from h
1
and h
3
(which are active) but not from h
2
(which is inactive).
The slope of each linear region is determined by (i) the original slopes θ
•1
of the
active inputs for this region, and (ii) the weights ϕ
•
that were subsequently applied.
For example, the slope in the shaded region is θ
11
ϕ
1
+ θ
31
ϕ
3
.
Each hidden unit contributes one ‘joint’ to the function, so with three hidden
units, there can be four linear regions. However, only three of the slopes of these
regions are independent; the fourth is either zero (if all the hidden units are inactive
Problem 3.9
in this region) or is a sum of slopes from the other regions.
3.1.2 Depicting neural networks
We have been discussing a neural network with one input, one output, and three
hidden units. We visualize this network in gure 3.4a. The input is on the left,
the hidden units are in the middle, and the output is on the right. Viewed in
this way, each connection represents one of the ten parameters. To simplify this
representation, we do not typically draw the intercept parameters, and so this
network would usually be depicted as in gure 3.4b.
1
A linear function has the form z
′
= ϕ
0
+
∑
i
ϕ
i
z
i
. Any other type of function is non-linear.
For instance, the ReLU function (equation 3.2) and the example neural network that contains it
(equation 3.1) are both non-linear.
This work is subject to a Creative Commons CC-BY-NC-ND license. (C) MIT Press.