1.2 Why sampling?
The trouble with integrals, of course, is that they can be very difficult to calculate. The methods we learned
in calculus class are fine for classroom exercises, but often cannot be applied to interesting problems in the
real world. Indeed, analytical solutions to (8) and the denominator of (9) might be impossible to obtain, so
we might not be able to determine the exact form of P(π|X ). Gibbs sampling allows us to sample from a
distribution that asymptotically follows P (π|X ) without having to explicitly calculate the integrals.
1.2.1 Monte Carlo: a circle, a square, and a bag of rice
Gibbs Sampling is an instance of a Markov Chain Monte Carlo technique. Let’s start with the “Monte
Carlo” part. You can think of Monte Carlo methods as algorithms that help you obtain a desired value by
performing simulations involving probabilistic choices. As a simple example, here’s a cute, low-tech Monte
Carlo technique for estimating the value of π (the ratio of a circle’s circumference to its diameter).
6
Draw a perfect square on the ground. Inscribe a circle in it — i.e. the circle and the square are centered
in exactly the same place, and the circle’s diameter has length identical to the side of the square. Now take
a bag of rice, and scatter the grains uniformly at random inside the square. Finally, count the total number
of grains of rice inside the circle (call that C), and inside the square (call that S).
You scattered rice at random. Assuming you managed to do this pretty uniformly, the ratio between the
circle’s grains and the square’s grains (which include the circle’s) should approximate the ratio between the
area of the circle and the area of the square, so
C
S
≈
π(
d
2
)
2
d
2
. (10)
Solving for π, we get π ≈
4C
S
.
You may not have realized it, but we just solved a problem by approximating the values of integrals. The
true area of the circle, π(
d
2
)
2
, is the result of summing up an infinite number of infinitessimally small points;
similarly for the the true area d
2
of the square. The more grains of rice we use, the better our approximation
will be.
1.2.2 Markov Chains: walking the right walk
In the circle-and-square example, we saw the value of sampling involving a uniform distribution, since the
grains of rice were distributed uniformly within the square. Returning to the problem of computing expected
values, recall that we’re interested in E
p(x)
[f(x)] (equation 7), where we’ll assume that the distribution p(x)
is not uniform and, in fact, not easy to work with analytically.
Figure 2 provides an example f(z) and p(z) for illustration. Conceptually, the integral in equation (7)
sums up f (z)p(z) over infinitely many values of z. But rather than touching each point in the sum exactly
once, here’s another way to think about it: if you sample N points z
(0)
, z
(1)
, z
(2)
, . . . , z
(N)
at random from
the probability density p(z), then
E
p(z)
[f(z)] = lim
N→∞
1
N
N
X
t=1
f(z
(t)
). (11)
That looks a lot like a kind of averaged value for f, which makes a lot of sense since in the discrete case
(equation 6) the expected value is nothing but a weighted average, where the weight for each value of z is
its probability.
Notice, though, that the value in the sum is just f(z
(t)
), not f (z
(t)
)p(z
(t)
) as in the integral in equation (7).
Where did the p(z) part go? Intuitively, if we’re sampling according to p(z), and count(z) is the number of
6
We’re elaborating on the introductory example at http://en.wikipedia.org/wiki/Monte Carlo method.
4