没有合适的资源?快使用搜索试试~ 我知道了~
首页A tutorial on adaptive MCMC
A tutorial on adaptive MCMC
5星 · 超过95%的资源 需积分: 10 18 下载量 151 浏览量
更新于2023-03-03
评论 1
收藏 2.17MB PDF 举报
adaptive Markov chain Monte Carlo algorithms 的基本入门文献
资源详情
资源评论
资源推荐
Stat Comput (2008) 18: 343–373
DOI 10.1007/s11222-008-9110-y
A tutorial on adaptive MCMC
Christophe Andrieu ·Johannes Thoms
Received: 23 January 2008 / Accepted: 19 November 2008 / Published online: 3 December 2008
© Springer Science+Business Media, LLC 2008
Abstract We review adaptive Markov chain Monte Carlo
algorithms (MCMC) as a mean to optimise their perfor-
mance. Using simple toy examples we review their theo-
retical underpinnings, and in particular show why adaptive
MCMC algorithms might fail when some fundamental prop-
erties are not satisfied. This leads to guidelines concern-
ing the design of correct algorithms. We then review cri-
teria and the useful framework of stochastic approximation,
which allows one to systematically optimise generally used
criteria, but also analyse the properties of adaptive MCMC
algorithms. We then propose a series of novel adaptive al-
gorithms which prove to be robust and reliable in practice.
These algorithms are applied to artificial and high dimen-
sional scenarios, but also to the classic mine disaster dataset
inference problem.
Keywords MCMC · Adaptive MCMC · Controlled
Markov chain · Stochastic approximation
1 Introduction
Markov chain Monte Carlo (MCMC) is a general strategy
for generating samples {X
i
,i= 0, 1,...} from complex
high-dimensional distributions, say π defined on a space
C. Andrieu (
)
School of Mathematics, University of Bristol,
Bristol BS8 1TW, UK
e-mail: c.andrieu@bristol.ac.uk
url: http://www.stats.bris.ac.uk/~maxca
J. Thoms
Chairs of Statistics, École Polytechnique Fédérale de Lausanne,
1015 Lausanne, Switzerland
X ⊂R
n
x
(assumed for simplicity to have a density with re-
spect to the Lebesgue measure, also denoted π), from which
integrals of the type
I
(
f
)
:=
X
f
(
x
)
π
(
x
)
dx,
for some π -integrable functions
X → R
n
f
can be approxi-
mated using the estimator
ˆ
I
N
(
f
)
:=
1
N
N
i=1
f
(
X
i
)
, (1)
provided that the Markov chain generated with, say, transi-
tion P is ergodic i.e. it is guaranteed to eventually produce
samples {X
i
} distributed according to π . Throughout this
review we will refer, in broad terms, to the consistency of
such estimates and the convergence of the distribution of X
i
to π as π-ergodicity. The main building block of this class
of algorithms is the Metropolis-Hastings (MH) algorithm. It
requires the definition of a family of proposal distributions
{q(x,·), x ∈
X}whose role is to generate possible transitions
for the Markov chain, say from X to Y , which are then ac-
cepted or rejected according to the probability
α
(
X, Y
)
=min
1,
π
(
Y
)
q
(
Y,X
)
π
(
X
)
q
(
X, Y
)
.
The simplicity and universality of this algorithm are both
its strength and weakness. Indeed, the choice of the pro-
posal distribution is crucial: the statistical properties of the
Markov chain heavily depend upon this choice, an inade-
quate choice resulting in possibly poor performance of the
Monte Carlo estimators. For example, in the toy case where
n
x
= 1 and the normal symmetric random walk Metropo-
lis algorithm (N-SRWM) is used to produce transitions, the
344 Stat Comput (2008) 18: 343–373
density of the proposal distribution is of the form
q
θ
(x, y) =
1
√
2πθ
2
exp
−1
2θ
2
(
y −x
)
2
,
where θ
2
is the variance of the proposed increments, hence
defining a Markov transition probability P
θ
. The variance
of the corresponding estimator
ˆ
I
θ
N
(f ), which we wish to be
as small as possible for the purpose of efficiency, is well
known to be typically unsatisfactory for values of θ
2
that
are either “too small or too large” in comparison to optimal
or suboptimal value(s). In more realistic scenarios, MCMC
algorithms are in general combinations of several MH up-
dates {P
k,θ
,k=1,...,n, θ ∈} for some set , with each
having its own parametrised proposal distribution q
k,θ
for
k = 1,...,n and sharing π as common invariant distribu-
tion. These transition probabilities are usually designed in
order to capture various features of the target distribution π
and in general chosen to complement one another. Such a
combination can for example take the form of a mixture of
different strategies, i.e.
P
θ
(
x,dy
)
=
n
k=1
w
k
(θ)P
k,θ
(
x,dy
)
, (2)
where for any θ ∈,
n
k=1
w
k
(θ) = 1, w
k
(θ) ≥ 0, but can
also, for example, take the form of combinations (i.e. prod-
ucts of transition matrices in the discrete case) such as
P
θ
(
x,dy
)
=P
1,θ
P
2,θ
···P
n,θ
(x, dy).
Both examples are particular cases of the class of Markov
transition probabilities P
θ
on which we shall focus in
this paper: they are characterised by the fact that they
(a) belong to a family of parametrised transition proba-
bilities {P
θ
,θ ∈ } (for some problem dependent set ,
=(0, +∞) in the toy example above) (b) for all θ ∈π
is an invariant distribution for P
θ
, which is assumed to be
ergodic (c) the performance of P
θ
, for example the variance
of
ˆ
I
θ
N
(f ) above, is sensitive to the choice of θ.
Our aim in this paper is to review the theoretical under-
pinnings and recent methodological advances in the area
of computer algorithms that aim to “optimise” such para-
metrised MCMC transition probabilities in order to lead
to computationally efficient and reliable procedures. As we
shall see we also suggest new algorithms. One should note at
this point that in some situations of interest, such as temper-
ing type algorithms (Geyer and Thompson 1995), property
(b) above might be violated and instead the invariant distrib-
ution of P
θ
might depend on θ ∈ (although only a non θ-
dependent feature of this distribution π
θ
might be of interest
to us for practical purposes). We will not consider this case
in depth here, but simply note that most of the arguments
and ideas presented hereafter generally carry on to this
slightly more complex scenario e.g. (Benveniste et al. 1990;
Atchadé and Rosenthal 2005).
The choice of a criterion to optimise is clearly the first
decision that needs to be made in practice. We discuss this
issue in Sect. 4.1 where we point out that most sensible opti-
mality or suboptimality criteria can be expressed in terms of
expectations with respect to the steady state-distributions of
Markov chains generated by P
θ
for θ ∈ fixed, and make
new suggestions in Sect. 5 which are subsequently illus-
trated on examples in Sect. 6. We will denote by θ
∗
a generic
optimal value for our criteria, which is always assumed to
exist hereafter.
In order to optimise such criteria, or even simply find
suboptimal values for θ , one could suggest to sequentially
run a standard MCMC algorithm with transition P
θ
for a
set of values of θ (either predefined or defined sequentially)
and compute the criterion of interest (or its derivative etc.)
once we have evidence that equilibrium has been reached.
This can naturally be wasteful and we will rather focus here
on a technique which belongs to the well known class of
processes called controlled Markov chains (Borkar 1990)
in the engineering literature, which we will refer to as con-
trolled MCMC (Andrieu and Robert 2001), due to their nat-
ural filiation. More precisely we will assume that the algo-
rithm proceeds as follows. Given a family of transition prob-
abilities {P
θ
,θ ∈ } defined on X such that for any θ ∈ ,
πP
θ
= π (meaning that if X
i
∼ π , then X
i+1
∼ π ,X
i+2
∼
π,...) and given a family of (possibly random) mappings
{θ
i
: × X
i+1
→ ,i = 1,...}, which encodes what is
meant by optimality by the user, the most general form of
a controlled MCMC proceeds as follows:
Algorithm 1 Controlled Markov chain Monte Carlo
• Sample initial values θ
0
,X
0
∈ ×X.
• Iteration i +1 (i ≥0), given θ
i
=θ
i
(θ
0
,X
0
,...,X
i
) from
iteration
i
1.
Sample X
i+1
|(θ
0
,X
0
,...,X
i
) ∼P
θ
i
(X
i
, ·).
2. Compute θ
i+1
=θ
i+1
(θ
0
,X
0
,...,X
i+1
).
In Sect. 4.2 we will focus our results to particular map-
pings well suited to our purpose of computationally efficient
sequential updating of {θ
i
} for MCMC algorithms, which
rely on the Robbins-Monro update and more generally on
the stochastic approximation framework (Benveniste et al.
1990). However, before embarking on the description of
practical procedures to optimise MCMC transition probabil-
ities we will first investigate, using mostly elementary un-
dergraduate level tools, some of the theoretical ergodicity
properties of controlled MCMC algorithms.
Indeed, as we shall see, despite the assumption that for
any θ ∈ , πP
θ
= π , adaptation in the context of MCMC
Stat Comput (2008) 18: 343–373 345
using the controlled approach leads to complications. In
fact, this type of adaptation can easily perturb the ergodicity
properties of MCMC algorithms. In particular algorithms of
this type will in most cases lead to the loss of π as an invari-
ant distribution of the process {X
i
}, which intuitively should
be the minimum requirement to produce samples from π
and lead to consistent estimators. Note also that when not
carefully designed such controlled MCMC can lead to tran-
sient processes or processes such that
ˆ
I
N
(f ) is not consis-
tent. Studying the convergence properties of such processes
naturally raises the question of the relevance of such devel-
opments in the present context. Indeed it is often argued that
one might simply stop adaptation once we have enough evi-
dence that {θ
i
} has reached a satisfactory optimal or subop-
timal value of θ and then simply use samples produced by a
standard MCMC algorithm using such a fixed good value
˜
θ.
No new theory should then be required. While apparently
valid, this remark ignores the fact that most criteria of in-
terest depend explicitly on features of π , which can only
be evaluated with. .. MCMC algorithms. For example, as
mentioned above most known and useful criteria can be for-
mulated as expectations with respect to distributions which
usually explicitly involve π.
Optimising such criteria, or finding suboptimal values
of θ
∗
, thus requires one to be able to sample—perhaps ap-
proximately or asymptotically—from π, which in the con-
text of controlled MCMC requires one to ensure that the
process described above can, in principle, achieve this aim.
This, in our opinion, motivates and justifies the need for
such theoretical developments as they establish whether or
not controlled MCMC can, again in principle, optimise such
π-dependent criteria. Note that convergence of {θ
i
} should
itself not be overlooked since, in light of our earlier discus-
sion of the univariate N-SRWM, optimisation of {P
θ
} is our
primary goal and should be part of our theoretical develop-
ments. Note that users wary of the perturbation to ergodic-
ity brought by adaptation might naturally choose to “freeze”
{θ
i
} to a value θ
τ
beyond an iteration τ and consider only
samples produced by the induced Markov chain for their in-
ference problem. A stopping rule is described in Sect. 4.2.2.
In fact, as we shall see it is possible to run the two proce-
dures simultaneously.
Finally, whereas optimising an MCMC algorithm seems
a legitimate thing to do, one might wonder if it is compu-
tationally worth adapting. This is a very difficult question
for which there is probably no straight answer. The view we
adopt here is that such optimisation schemes are very useful
tools to design or help the design of efficient MCMC algo-
rithms which, while leading to some additional computation,
have the potential to spare the MCMC user significant im-
plementation time.
The paper is organised as follows. In Sect. 2 we provide
toy examples that illustrate the difficulties introduced by the
adaptation of MCMC algorithms. In Sect. 3 we discuss why
one might expect vanishing adaptation to lead to processes
such that {X
i
} can be used in order to estimate expectation
with respect to π . This section might be skipped on a first
reading. In Sect. 4 we first discuss various natural criteria
which are motivated by theory, but to some extent simplified
in order to lead to useful and implementable algorithms. We
then go on to describe how the standard framework of sto-
chastic approximation, of which the Robbins-Monro recur-
sion is the cornerstone, provides us with a systematic frame-
work to design families of mappings {θ
i
}in a recursive man-
ner and understand their properties. In Sect. 5 we present a
series of novel adaptive algorithms which circumvent some
of the caveats of existing procedures. These algorithms are
applied to various examples in Sect. 6.
2 The trouble with adaptation
In this section we first illustrate the loss of π-ergodicity of
controlled MCMC with the help of two simple toy examples.
The level of technicality required for these two examples
is that of a basic undergraduate course on Markov chains.
Despite their simplicity, these examples suggest that vanish-
ing adaptation (a term made more precise later) might pre-
serve asymptotic π-ergodicity. We then finish this section
by formulating more precisely the fundamental difference
between standard MCMC algorithms and their controlled
counterparts which affects the invariant distribution of the
algorithm. This requires the introduction of some additional
notation used in Sect. 4 and a basic understanding of expec-
tations to justify vanishing adaptation, but does not signifi-
cantly raise the level of technicality.
Consider the following toy example, suggested in An-
drieu and Moulines (2006), where
X ={1, 2} and π =
(1/2, 1/2) (it is understood here that for such a case we will
abuse notation and use π for the vector of values of π and
P
θ
for the transition matrix) and where the family of transi-
tion probabilities under consideration is of the form, for any
θ ∈ :=(0, 1)
P
θ
=
P
θ
(X
i
=1,X
i+1
=1)P
θ
(X
i
=1,X
i+1
=2)
P
θ
(X
i
=2,X
i+1
=1)P
θ
(X
i
=2,X
i+1
=2)
=
θ 1 −θ
1 −θθ
. (3)
It is clear that for any θ ∈ , π is a left eigenvector of P
θ
with eigenvalue 1,
πP
θ
=π,
i.e. π is an invariant distribution of P
θ
. For any θ ∈ the
Markov chain is obviously irreducible and aperiodic, and
by standard theory is therefore ergodic, i.e. for any starting
346 Stat Comput (2008) 18: 343–373
probability distribution μ,
lim
i→∞
μP
i
θ
=π
(with P
i
θ
the i-th power of P
θ
), and for any finite real valued
function f
lim
N→∞
1
N
N
i=1
f(X
i
) =E
π
(f (X)),
almost surely, where for any probability distribution ν, E
ν
represents the expectation operator with respect to ν.Now
assume that θ is adapted to the current state in order to sam-
ple the next state of the chain, and assume for now that
this adaptation is a time invariant function of the previous
state of the MC. More precisely assume that for any i ≥ 1
the transition from X
i
to X
i+1
is parametrised by θ(X
i
),
where θ :
X →. The remarkable property, specific to this
purely pedagogical example, is that {X
i
} is still in this case
a time homogeneous Markov chain with transition proba-
bility
ˇ
P(X
i
=a,X
i+1
=b) :=P
θ(a)
(X
i
=a,X
i+1
=b)
for a,b ∈
X, resulting in the time homogeneous transition
matrix
ˇ
P :=
θ(1) 1 −θ(1)
1 −θ(2)θ(2)
. (4)
Naturally the symmetry of P
θ
above is lost and one can
check that the invariant distribution of
ˇ
P is
ˇπ =
1 −θ(2)
2 −θ(1) −θ(2)
,
1 −θ(1)
2 −θ(1) −θ(2)
=π,
in general. For θ(1), θ(2) ∈ the time homogeneous
Markov chain will be ergodic, but will fail to converge to
π as soon as θ(1) = θ(2), that is as soon as there is depen-
dence on the current state. As we shall see, the principle of
vanishing adaptation consists of the present toy example of
making both θ(1) and θ(2) time dependent (deterministi-
cally for simplicity here), denoted θ
i
(1) and θ
i
(2) at itera-
tion i, and ensure that as i →∞, |θ
i
(1) − θ
i
(2)| vanishes.
Indeed, while {θ
i
(1)} and {θ
i
(2)} are allowed to evolve for-
ever (and maybe not converge) the corresponding transition
probabilities {
ˇ
P
i
:=P
θ
i
(X
i
)
}have invariant distributions {ˇπ
i
}
convergent to π . We might hence expect one to recover π -
ergodicity. In fact in the present case standard theory for
non-homogeneous Markov chains can be used in order to
find conditions on {θ
i
} that ensure ergodicity, but we do not
pursue this in depth here.
It could be argued, and this is sometimes suggested, that
the problem with the example above is that in order to pre-
serve π as a marginal distribution, θ should not depend on
X
i
for the transition to X
i+1
, but on X
0
,...,X
i−1
only. For
simplicity assume that the dependence is on X
i−1
only. Then
it is sometimes argued that since
π(X
i
=1)
π(X
i
=2)
T
×
P
θ(X
i−1
)
(X
i
=1,X
i+1
=1)P
θ(X
i−1
)
(X
i
=1,X
i+1
=2)
P
θ(X
i−1
)
(X
i
=2,X
i+1
=1)P
θ(X
i−1
)
(X
i
=2,X
i+1
=2)
=
π(X
i
=1)
π(X
i
=2)
T
θ(X
i−1
) 1 −θ(X
i−1
)
1 −θ(X
i−1
)θ(X
i−1
)
=
π(X
i+1
=1), π(X
i+1
=2)
,
then X
i+1
,X
i+2
,... are all marginally distributed accord-
ing to π. Although this calculation is correct, the underly-
ing reasoning is naturally incorrect in general. This can be
checked in two ways. First through a counterexample which
only requires elementary arguments. Indeed in the situation
just outlined, the law of X
i+1
given θ
0
,X
0
,...,X
i−1
,X
i
is P
θ(X
i−1
)
(X
i
,X
i+1
∈·), from which we deduce that Z
i
=
(Z
i
(1), Z
i
(2)) =(X
i
,X
i−1
) is a time homogeneous Markov
chain with transition
P
θ
(
Z
i
(2)
)
(Z
i
(1), Z
i+1
(1)) I{Z
i+1
(2) =Z
i
(1)},
where for a set A, IA denotes its indicator function. De-
noting the states
¯
1 := (1, 1),
¯
2 := (1, 2),
¯
3 := (2, 1) and
¯
4 := (2, 2), the transition matrix of the time homogeneous
Markov chain is
ˇ
P =
⎡
⎢
⎢
⎣
θ(1) 01−θ(1) 0
θ(2) 01−θ(2) 0
01−θ(1) 0 θ(1)
01−θ(2) 0 θ(2)
⎤
⎥
⎥
⎦
and it can be directly checked that the marginal invariant
distribution of Z
i
(1) is
ˇπ =
2 +
θ(2)
1 −θ(1)
+
θ(1)
1 −θ(2)
−1
×
1+θ(2)−θ(1)
1−θ(1)
1+θ(1)−θ(2)
1−θ(2)
=(1/2, 1/2),
in general. The second and more informative approach con-
sists of considering the actual distribution of the process
generated by a controlled MCMC. Let us denote
ˇ
E
∗
the
expectation for the process started at some arbitrary θ,x ∈
×
X. This operator is particularly useful to describe the
expectation of ψ(X
i
,X
i+1
,...) for any i ≥1 and any func-
tion ψ :
X
k
ψ
→ R,
ˇ
E
∗
(ψ(X
i
,X
i+1
,...,X
i+k
ψ
−1
)).More
precisely it allows one to clearly express the dependence of
θ
i
(θ
0
,X
0
,...,X
i
) on the past θ
0
,X
0
,...,X
i
of the process.
Indeed for any f :
X → R, using the tower property of
expectations and the definition of controlled MCMC given
Stat Comput (2008) 18: 343–373 347
in the introduction, we find that
ˇ
E
∗
(
f(X
i+1
)
)
=
ˇ
E
∗
ˇ
E
∗
(
f(X
i+1
)|θ
0
,X
0
,...,X
i
)
=
ˇ
E
∗
X
P
θ
i
(X
0
,...,X
i
)
(X
i
,dx)f(x)
, (5)
which is another way of saying that the distribution of
X
i+1
is that of a random variable sampled, conditional
upon θ
0
,X
0
,...,X
i
, according to the random transition
P
θ
i
(X
0
,...,X
i
)
(X
i
,X
i+1
∈·), where the pair θ
i
(θ
0
,X
0
,
...,X
i
), X
i
is randomly drawn from a distribution com-
pletely determined by the possible histories θ
0
,X
0
,...,X
i
.
In the case where
X is a finite discrete set, writing this
relation concisely as the familiar product of a row vector
and a transition matrix as above would require one to de-
termine the (possibly very large) set of values for the pair
θ
i
(θ
0
,X
0
,...,X
i
), X
i
(say W
i
), the vector representing the
probability distribution of all these pairs as well as the tran-
sition matrix from
W
i
to X. The introduction of the expec-
tation allows one to bypass these conceptual and notational
difficulties. We will hereafter denote
ϕ(θ
0
,X
0
,...,X
i
) :=
X
P
θ
i
(θ
0
,X
0
,...,X
i
)
(X
i
,dx)f(x),
and whenever possible will drop unnecessary arguments i.e.
arguments of ϕ which do not affect its values.
The possibly complex dependence on θ
i
(θ
0
,X
0
,...,X
i
),
X
i
of the transition of the process to X
i+1
needs to be con-
trasted with the case of standard MCMC algorithms. Indeed,
in this situation the randomness of the transition probability
only stems from X
i
. This turns out to be a major advantage
when it comes to invariant distributions. Let us assume that
for some i ≥1
ˇ
E
∗
(g(X
i
)) = E
π
(g(X)) for all π-integrable
functions g. Then according to the identity in (5), for any
given θ ∈ and θ
i
= θ for all i ≥0 a standard MCMC al-
gorithm has the well known and fundamental property
ˇ
E
∗
(f (X
i+1
)) =
ˇ
E
∗
(
ϕ(θ,X
i
)
)
= E
π
(
ϕ(θ,X)
)
=
X×X
π(dx)P
θ
(x,dy)f (y) =E
π
(
f(X)
)
,
where the second equality stems from the assumption
ˇ
E
∗
(g(X
i
)) =E
π
(g(X)) and the last equality is obtained by
the assumed invariance of π for P
θ
for any θ ∈ .Now
we turn to the controlled MCMC process and focus for
simplicity on the case θ
i
(θ
0
,X
0
,...,X
i
) = θ(X
i−1
), cor-
responding to our counterexample. Assume that for some
i ≥ 1 X
i
is marginally distributed according to π , i.e. for
any g :
X →R,
ˇ
E
∗
(g(X
i
)) =E
π
(g(X)), then we would like
to check if
ˇ
E
∗
(g(X
j
)) = E
π
(g(X)) for all j ≥ i. However
using the tower property of expectations in order to exploit
the property
ˇ
E
∗
(g(X
i
)) =E
π
(g(X)),
ˇ
E
∗
(f (X
i+1
)) =
ˇ
E
∗
(
ϕ(X
i−1
,X
i
)
)
=
ˇ
E
∗
ˇ
E
∗
(
ϕ(X
i−1
,X
i
)|X
i
)
= E
π
ˇ
E
∗
(
ϕ(X
i−1
,X)|X
)
.
Now it would be tempting to use the stationarity assumption
in the last expression,
E
π
(
ϕ(X
i−1
,X)
)
=
X×X
π(dx)P
θ(X
i−1
)
(x,dy)f (y)
= E
π
(f (X)).
This is however not possible due to the presence of the
conditional expectation
ˇ
E
∗
(·|X) (which crucially depends
on X) and conclude that in general
ˇ
E
∗
(f (X
i+1
))
=
ˇ
E
∗
E
π
X
P
θ(θ
0
,X
0
,X
i−1
)
(X, dx
i+1
)f (x
i+1
)
.
The misconception that this inequality might be an equality
is at the root of the incorrect reasoning outlined earlier. This
problem naturally extends to more general situations.
Vanishing adaptation seems, intuitively, to offer the pos-
sibility to circumvent the problem of the loss of π as in-
variant distribution. However, as illustrated by the follow-
ing toy example, vanishing adaptation might come with
its own shortcomings. Consider a (deterministic) sequence
{θ
i
}⊂(−1, 1)
N
and for simplicity first consider the non-
homogeneous, and non-adaptive, Markov chain {X
i
} with
transition P
θ
i
at iteration i ≥ 1, where P
θ
is given by (3),
and initial distribution (μ, 1 − μ) for μ ∈[0, 1]. One can
easily check that for any n ≥ 1 the product of matrices
P
θ
1
×···×P
θ
n
has the simple expression
P
θ
1
×···×P
θ
n
=
1
2
1 +
n
i=1
(2θ
i
−1) 1 −
n
i=1
(2θ
i
−1)
1 −
n
i=1
(2θ
i
−1) 1 +
n
i=1
(2θ
i
−1)
.
As a result one deduces that the distribution of X
n
is
1
2
1 +(2μ −1)
n
i=1
(2θ
i
−1) 1 −(2μ −1)
n
i=1
(2θ
i
−1)
.
Now if θ
i
→ 0 (resp. θ
i
→ 1) and
∞
i=1
θ
i
< +∞ (resp.
∞
i=1
(1 −θ
i
)<+∞ ), that is convergence to either 0 or 1
of {θ
i
} is “too fast” , then lim
n→∞
n
i=1
(2θ
i
−1) = 0 and
as a consequence, whenever μ =1/2, the distribution of X
n
does not converge to π =(1/2, 1/2). Similar developments
are possible for the toy adaptive MCMC algorithm given by
剩余30页未读,继续阅读
计算科学
- 粉丝: 27
- 资源: 3
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- 2022年中国足球球迷营销价值报告.pdf
- 房地产培训 -营销总每天在干嘛.pptx
- 黄色简约实用介绍_汇报PPT模板.pptx
- 嵌入式系统原理及应用:第三章 ARM编程简介_3.pdf
- 多媒体应用系统.pptx
- 黄灰配色简约设计精美大气商务汇报PPT模板.pptx
- 用matlab绘制差分方程Z变换-反变换-zplane-residuez-tf2zp-zp2tf-tf2sos-sos2tf-幅相频谱等等.docx
- 网络营销策略-网络营销团队的建立.docx
- 电子商务示范企业申请报告.doc
- 淡雅灰低面风背景完整框架创业商业计划书PPT模板.pptx
- 计算模型与算法技术:10-Iterative Improvement.ppt
- 计算模型与算法技术:9-Greedy Technique.ppt
- 计算模型与算法技术:6-Transform-and-Conquer.ppt
- 云服务安全风险分析研究.pdf
- 软件工程笔记(完整版).doc
- 电子商务网项目实例规划书.doc
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论1