The equivalence of logistic regression and maximum entropy
models
John Mount
∗
September 23, 2011
Abstract
As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/2011/09/the-simpler-
derivation-of-logistic-regression/ (link) ) there is one derivation of Logistic Regression that is particularly
beautiful. It is not as general as that found in Agresti[Agresti, 1990] (which deals with generalized
linear models in their full generality), but gets to the important balance equations very quickly. We
will pursue this further to re-derive multi-category logistic regression in both its standard (sigmoid)
phrasing and also in its equivalent maximum entropy clothing.
It is well known that logistic regression and maximum entropy modeling are equivalent (for example
see [Klein and Manning, 2003])- but we will show that the simpler derivation already given is a very
good way to demonstrate the equivalence (and points out that logistic regression is actually special-
not just one of many equivalent GLMs).
1 Overview
We will proceed as follows:
1. This outline.
2. Introduce a simplified machine learning problem and some notation.
3. Re-discover logistic regression by a simplified version of the standard derivations.
4. Re-invent logistic regression by using the maximum entropy method.
5. Draw some conclusions.
2 Notation
Suppose our machine learning input is a sequence of real vectors of dimension n (where we have pre-
processed in the machine learning tricks of converting categorical variables into indicators over levels
and adding a constant variable to our representation).
Our notation will be as follows:
1. x(1) · · · x(m) will denote our input data. Each x(i) is a vector in R
n
. We will use the function of
i notation to denote which specific example we are working with. We will also use the variable
j to denote which of the n coordinates or parameters we are interested in (as in x(i)
j
).
∗
email: mailto:jmount@win-vector.com web: http://www.win-vector.com/
1
评论0