intro — Introduction to Bayesian analysis 13
The symmetric Gaussian distribution is a common choice for a proposal distribution q(·), and this is
the one used in the original Metropolis algorithm.
Another important MCMC method that can be viewed as a special case of MH is Gibbs sampling
(Gelfand et al. 1990), where the updates are the full conditional distributions of each parameter
given the rest of the parameters. Gibbs updates are always accepted. If θ = (θ
1
, . . . , θ
d
) and, for
j = 1 . . . , d, q
j
is the conditional distribution of θ
j
given the rest θ
{−j}
, then the Gibbs algorithm
is the following. For t = 1, . . . , T −1 and for j = 1, . . . , d: θ
j
t
∼ q
j
(·|θ
{−j}
t−1
). This step is referred
to as a Gibbs update.
All MCMC methods share some limitations and potential problems. First, any simulated chain is
influenced by its starting values, especially for short MCMC runs. It is required that the starting point
has a positive posterior probability, but even when this condition is satisfied, if we start somewhere
in a remote tail of the target distribution, it may take many iterations to reach a region of appreciable
probability. Second, because there is no obvious stopping criterion, it is not easy to decide for how long
to run the MCMC algorithm to achieve convergence to the target distribution. Third, the observations
in MCMC samples are strongly dependent and this must be taken into account in any subsequent
statistical inference. For example, the errors associated with the Monte Carlo integration should be
calculated according to (7), which accounts for autocorrelation.
Adaptive random-walk Metropolis–Hastings
The choice of a proposal distribution q(·) in the MH algorithm is crucial for the mixing properties
of the resulting Markov chain. The problem of determining an optimal proposal for a particular target
posterior distribution is difficult and is still being researched actively. All proposed solutions are based
on some form of an adaptation of the proposal distribution as the Markov chain progresses, which is
carefully designed to preserve the ergodicity of the chain, that is, its tendency to converge to the target
distribution. These methods are known as adaptive MCMC methods (Haario, Saksman, and Tamminen
[2001]; Giordani and Kohn [2010]; and Roberts and Rosenthal [2009]).
The majority of adaptive MCMC methods are random-walk MH algorithms with updates of the
form: θ
∗
= θ
t−1
+ Z
t
, where Z
t
follows some symmetric distribution. Specifically, we consider a
Gaussian random-walk MH algorithm with Z
t
∼ N(0, ρ
2
Σ), where ρ is a scalar controlling the scale
of random jumps for generating updates and Σ is a d-dimensional covariance matrix. One of the first
important results regarding adaptation is from Gelman, Gilks, and Roberts (1997), where the authors
derive the optimal scaling factor ρ = 2.38/
√
d and note that the optimal Σ is the true covariance
matrix of the target distribution.
Haario, Saksman, and Tamminen (2001) proposes Σ to be estimated by the empirical covariance
matrix plus a small diagonal matrix ×I
d
to prevent zero covariance matrices. Alternatively, Roberts
and Rosenthal (2009) proposed a mixture of the two covariance matrices,
Σ
t
= β
b
Σ + (1 − β)Σ
0
for some fixed covariance matrix Σ
0
and β ∈ [0, 1].
Because the proposal distribution of an adaptive MH algorithm changes at each step, the ergodicity
of the chain is not necessarily preserved. However, under certain assumptions about the adaptation
procedure, the ergodicity does hold; see Roberts and Rosenthal (2007), Andrieu and Moulines (2006),
Atchad
´
e and Rosenthal (2005), and Giordani and Kohn (2010) for details.