
1
R E C U R S I V E G E N E R A L I S E D L I N E A R M O D E L S
Deep learning and the use of deep neural networks [1] are now estab-
lished as a key tool for practical machine learning. Neural networks
have an equivalence with many existing statistical and machine learn-
ing approaches and I would like to explore one of these views in this
post. In particular, I’ll look at the view of deep neural networks as re-
cursive generalised linear models (RGLMs). Generalised linear mod-
els form one of the cornerstones of probabilistic modelling and are
used in almost every field of experimental science, so this connection
is an extremely useful one to have in mind. I’ll focus here on what
are called feed-forward neural networks and leave a discussion of the
statistical connections to recurrent networks to another post.
1.1 generalised linear models
The basic linear regression model is a linear mapping from P-dimensional
input features (or covariates) x, to a set of targets (or responses) y, us-
ing a set of weights (or regression coefficients) β and a bias (offset)
β
0
. The outputs can also by multivariate, but I’ll assume they are
scalar here. The full probabilistic model assumes that the outputs are
corrupted by Gaussian noise of unknown variance σ
2
.
η = β
>
x + β
0
y = η + ∼ N(0, σ
2
)
In this formulation, η is the systematic component of the model and
is the random component. Generalised linear models (GLMs)[2] al-
low us to extend this formulation to problems where the distribution
on the targets is not Gaussian but some other distribution (typically a
distribution in the exponential family). In this case, we can write the
generalised regression problem, combining the coefficients and bias
for more compact notation, as:
η = β
>
x, β = [
ˆ
β, β
0
], x = [ˆx, 1]
E[y] = µ = g
−1
(η)
where g(·) is the link function that allows us to move from natural
parameters η to mean parameters µ. If the inverse link function used
in the definition of µ above were the logistic sigmoid, then the mean
parameters correspond to the probabilities of y being a 1 or 0 under
3
评论0