没有合适的资源?快使用搜索试试~ 我知道了~
首页序列蒙特卡洛算法(粒子滤波)On sequential Monte Carlo sampling methods for Bayesian filtering
资源详情
资源评论
资源推荐

Statistics and Computing (2000) 10, 197–208
On sequential Monte Carlo sampling
methods for Bayesian filtering
ARNAUD DOUCET, SIMON GODSILL and CHRISTOPHE ANDRIEU
Signal Processing Group, Department of Engineering, University of Cambridge,
Trumpington Street, CB2 1PZ Cambridge, UK
ad2@eng.cam.ac.uk, sjg@eng.cam.ac.uk, ca226@eng.cam.ac.uk
Received July 1998 and accepted August 1999
In this article, we present an overview of methods for sequential simulation from posterior distribu-
tions. These methods are of particular interest in Bayesian filtering for discrete time dynamic models
that are typically nonlinear and non-Gaussian. A general importance sampling framework is devel-
oped that unifies many of the methods which have been proposed over the last few decades in several
different scientific disciplines. Novel extensions to the existing methods are also proposed. We show in
particular how to incorporate local linearisation methods similar to those which have previously been
employed in the deterministic filtering literature; these lead to very effective importance distributions.
Furthermore we describe a method which uses Rao-Blackwellisation in order to take advantage of
the analytic structure present in some important classes of state-space models. In a final section we
develop algorithms for prediction, smoothing and evaluation of the likelihood in dynamic models.
Keywords: Bayesian filtering, nonlinear non-Gaussian state space models, sequential Monte Carlo
methods, particle filtering, importance sampling, Rao-Blackwellised estimates
I. Introduction
Many problems in applied statistics, statistical signal process-
ing, time series analysis and econometrics can be stated in a
state space form as follows. A transition equation describes the
prior distribution of a hidden Markov process {x
k
; k ∈N}, the
so-called hidden state process, and an observation equation de-
scribes the likelihood of the observations {y
k
; k ∈N}, k being a
discrete time index. Within a Bayesian framework, all relevant
information about {x
0
, x
1
,...,x
k
}given observations up to and
including time k can be obtained from the posterior distribu-
tion p(x
0
, x
1
,...,x
k
|y
0
,y
1
,...,y
k
). In many applications we
are interested in estimating recursively in time this distribution,
and particularly one of its marginals, the so-called filtering dis-
tribution p(x
k
|y
0
, y
1
,...,y
k
). Given the filtering distribution
one can then routinely proceed to filtered point estimates such
as the posterior mode or mean of the state. This problem is
known as the Bayesian filtering problem or the optimal filtering
problem. Practical applications include target tracking (Gordon,
Salmond and Smith 1993), blind deconvolution of digital com-
munications channels (Clapp and Godsill 1999, Liu and Chen
1995), estimation of stochastic volatility (Pitt and Shephard
1999) and digital enhancement of speech and audio signals
(Godsill and Rayner 1998).
Except in a few special cases, including linear Gaussian
state space models (Kalman filter) and hidden finite-state space
Markov chains, it is impossible to evaluate these distributions
analytically. From the mid 1960’s, a great deal of attention has
been devoted to approximating these filtering distributions, see
for example Jazwinski (1970). The most popular algorithms,
the extended Kalman filter and the Gaussian sum filter, rely on
analytical approximations (Anderson and Moore 1979). Inter-
esting work in the automatic control field was carried out during
the 1960’s and 70’s using sequential Monte Carlo (MC) inte-
gration methods, see Akashi and Kumamoto (1975), Handschin
and Mayne (1969), Handschin (1970), and Zaritskii, Svetnik and
Shimelevich (1975). Possibly owing to the severe computational
limitations of the time, these Monte Carlo algorithms have been
largely neglected until recent years. In the late 80’s, massive
increases in computational power allowed the rebirth of numeri-
cal integration methods for Bayesian filtering (Kitagawa 1987).
Current research has now focused on MC integration methods,
which have the great advantage of not being subject to the as-
sumption of linearity or Gaussianity in the model, and relevant
0960-3174
C
°
2000 Kluwer Academic Publishers

198 Doucet, Godsill and Andrieu
work includes M¨uller (1992), West (1993), Gordon, Salmond
and Smith (1993), Kong, Liu and Wong (1994) and Liu and
Chen (1998).
The main objective of this article is to include in a unified
framework many old and more recent algorithms proposed in-
dependently in a number of applied science areas. Both Liu
and Chen (1998) and Doucet (1997, 1998) underline the central
rˆole of sequential importance sampling in Bayesian filtering.
However, contrary to Liu and Chen (1998) which emphasizes
the use of hybrid schemes combining elements of importance
sampling with Markov Chain Monte Carlo (MCMC), we focus
here on computationally cheaper alternatives. We describe also
how it is possible to improve current existing methods via Rao-
Blackwellisation for a useful class of dynamic models. Finally,
we show how to extend these methods to compute the predic-
tion and fixed-interval smoothing distributions as well as the
likelihood.
The paper is organised as follows. In Section 2, we briefly re-
view the Bayesian filtering problem and classical Bayesian im-
portance sampling is proposed for its solution. We then present
a sequential version of this method which allows us to obtain a
general recursive MC filter: the sequential importance sampling
(SIS) filter. Under a criterion of minimum conditional variance of
the importance weights, we obtain the optimal importance func-
tion for this method. Unfortunately, for most models of applied
interest the optimal importance function leads to non-analytic
importance weights, and hence we propose several suboptimal
distributions and show how to obtain as special cases many of
the algorithms presented in the literature. Firstly we consider
local linearisation methods of either the state space model or
the optimal importance function, giving some important exam-
ples. These linearisation methods seem to be a very promising
way to proceed in problems of this type. Secondly we consider
some simple importance functions which lead to algorithms cur-
rently known in the literature. In Section 3, a resampling scheme
is used to limit practically the degeneracy of the algorithm. In
Section 4, we apply the Rao-Blackwellisation method to SIS
and obtain efficient hybrid analytical/MC filters. In Section 5,
we show how to use the MC filter to compute the prediction and
fixed-interval smoothing distributions as well as the likelihood.
Finally, simulations are presented in Section 6.
II. Filtering via Sequential
Importance Sampling
A. Preliminaries: Filtering for the state space model
The state sequence {x
k
; k ∈N}, x
k
∈R
n
x
, is assumed to be an
unobserved (hidden) Markov process with initial distribution
p(x
0
) (which we subsequently denote as p(x
0
|x
−1
) for no-
tational convenience) and transition distribution p(x
k
|x
k−1
),
where n
x
is the dimension of the state vector. The observa-
tions {y
k
; k ∈N}, y
k
∈R
n
y
, are conditionally independent given
the process {x
k
; k ∈N} with distribution p(y
k
|x
k
) and n
y
is the
dimension of the observation vector. To sum up, the model is a
hidden Markov (or state space) model (HMM) described by
p (x
k
| x
k−1
) for k ≥ 0 (1)
p (y
k
| x
k
) for k ≥ 0 (2)
We denote by x
0:n
4
={x
0
,...,x
n
}and y
0:n
4
={y
0
,...,y
n
}, re-
spectively, the state sequence and the observations up to time n.
Our aim is to estimate recursively in time the distribution
p(x
0:n
| y
0:n
) and its associated features including p(x
n
| y
0:n
)
and expectations of the form
I ( f
n
) =
Z
f
n
(x
0:n
)p(x
0:n
| y
0:n
) dx
0:n
(3)
for any p(x
0:n
| y
0:n
)-integrable f
n
: R
(n +1) ×n
x
→R. A recur-
sive formula for p(x
0:n
| y
0:n
)isgivenby:
p(x
0:n+1
| y
0:n+1
) = p(x
0:n
| y
0:n
)
p(y
n+1
| x
n+1
)p(x
n+1
| x
n
)
p(y
n+1
| y
0:n
)
(4)
The denominator of this expression cannot typically be com-
puted analytically, thus rendering an analytic approach infeasi-
ble except in the special cases mentioned above. It will later be
assumed that samples can easily be drawn from p(x
k
| x
k−1
) and
that we can evaluate p(x
k
| x
k−1
) and p(y
k
| x
k
) pointwise.
B. Bayesian Sequential Importance Sampling (SIS)
Since it is generally impossible to sample from the state pos-
terior p(x
0:n
|y
0:n
) directly, we adopt an importance sampling
(IS) approach. Suppose that samples {x
(i)
0:n
; i =1,...,N} are
drawn independently from a normalised importance function
π(x
0:n
| y
0:n
) which has a support including that of the state
posterior. Then an estimate
c
I
N
( f
n
) of the posterior expectation
I ( f
n
) is obtained using Bayesian IS (Geweke 1989):
c
I
N
( f
n
) =
N
X
i=1
f
n
¡
x
(i)
0:n
¢
˜w
(i)
n
, ˜w
(i)
n
=
w
∗(i)
n
P
N
j=1
w
∗( j )
n
(5)
where w
∗(i)
n
= p(y
0:n
| x
0:n
)p(x
0:n
)/π(x
0:n
| y
0:n
) is the unnor-
malised importance weight. Under weak assumptions
c
I
N
( f
n
)
converges to I ( f
n
), see for example Geweke (1989). However,
this method is not recursive. We now show how to obtain a
sequential MC filter using Bayesian IS.
Suppose one chooses an importance function of the form
π(x
0:n
| y
0:n
) = π(x
0
|y
0
)
n
Y
k=1
π(x
k
| x
0:k−1
, y
0:k
) (6)
Such an importance function allows recursive evaluation in time
of the importance weights as successive observations y
k
become

On sequential Monte Carlo sampling methods 199
available. We obtain directly the sequential importance sampling
filter.
Sequential Importance Sampling (SIS)
For times k =0, 1, 2,...
•
For i = 1,...,N, sample x
(i)
k
∼π(x
k
|x
(i)
0:k−1
, y
0:k
) and set
x
(i)
0:k
4
=(x
(i)
0:k−1
, x
(i)
k
) .
•
For i =1,...,N, evaluate the importance weights up to a
normalising constant:
w
∗(i)
k
= w
∗(i)
k−1
p
¡
y
k
¯
¯
x
(i)
k
¢
p
¡
x
(i)
k
¯
¯
x
(i)
k−1
¢
π
¡
x
(i)
k
¯
¯
x
(i)
0:k−1
, y
0:k
¢
(7)
•
For i = 1,...,N, normalise the importance weights:
˜w
(i)
k
=
w
∗(i)
k
P
N
j=1
w
∗( j )
k
(8)
A special case of this algorithm was introduced in 1969 by
Handschin and Mayne (1969) and Handschin (1970). Many of
the other algorithms proposed in the literature are later shown
also to be special cases of this general (and simple) algorithm.
Choice of importance function is of course crucial and one ob-
tains poor performance when the importance function is not well
chosen. This issue forms the topic of the following subsection.
C. Degeneracy of the algorithm
If Bayesian IS is interpreted as a Monte Carlo sampling method
rather than as a Monte Carlo integration method, the best possible
choice of importance function is of course the posterior distri-
bution itself, p(x
0:k
|y
0:k
). We would ideally like to be close to
this case. However, for importance functions of the form (6), the
variance of the importance weights can only increase (stochas-
tically) over time.
Proposition 1. The unconditional variance of the importance
weights, i.e. with the observations y
0:k
being interpreted as ran-
dom variables, increases over time.
The proof of this proposition is a straightforward extension of
a Kong-Liu-Wong theorem (Kong, Liu and Wong 1994) to the
case of an importance function of the form (6). Thus, it is impos-
sible to avoid a degeneracy phenomenon. In practice, after a few
iterations of the algorithm, all but one of the normalised impor-
tance weights are very close to zero and a large computational
effort is devoted to updating trajectories whose contribution to
the final estimate is almost zero.
D. Selection of the importance function
To limit degeneracy of the algorithm, a natural strategy consists
of selecting the importance function which minimises the vari-
ance of the importance weights conditional upon the simulated
trajectory x
(i)
0:k−1
and the observations y
0:k
.
Proposition 2. π (x
k
|x
(i)
0:k−1
, y
0:k
) = p(x
k
|x
(i)
k−1
, y
k
) is the im-
portance function which minimises the variance of the impor-
tance weight w
∗(i)
k
conditional upon x
(i)
0:k−1
and y
0:k
.
Proof: Straightforward calculations yield
var
π(x
k
|x
(i)
0:k−1
,y
0:k
)
£
w
∗(i)
k
¤
=
¡
w
∗(i)
k−1
¢
2
"
Z
¡
p(y
k
| x
k
)p
¡
x
k
| x
(i)
k−1
¢¢
2
π
¡
x
k
| x
(i)
0:k−1
, y
0:k
¢
dx
k
− p
2
¡
y
k
| x
(i)
k−1
¢
#
This variance is zero for π(x
k
| x
(i)
0:k−1
, y
0:k
) = p(x
k
| x
(i)
k−1
, y
k
).
¤
1. Optimal importance function
The optimal importance function p(x
k
|x
(i)
k−1
, y
k
) was intro-
duced by Zaritskii, Svetnik and Shimelevich (1975) then by
Akashi and Kumamoto (1977) for a particular case. More
recently, this importance function has been used in Chen
and Liu (1996), Kong, Liu and Wong (1994) and Liu and
Chen (1995). For this distribution, we obtain from (7) the
importance weight w
∗(i)
k
=w
∗(i)
k−1
p(y
k
|x
(i)
k−1
). The optimal im-
portance function suffers from two major drawbacks. It re-
quires the ability to sample from p(x
k
|x
(i)
k−1
, y
k
) and to eva-
luate p(y
k
|x
(i)
k−1
) =
R
p(y
k
|x
k
)p(x
k
| x
(i)
k−1
) dx
k
. This integral
will have no analytic form in the general case. Nevertheless,
analytic evaluation is possible for the important class of models
presented below, the Gaussian state space model with non-linear
transition equation.
Example 3. Nonlinear Gaussian State Space Models. Let us
consider the following model:
x
k
= f (x
k−1
) + v
k
, v
k
∼ N
¡
0
n
v
×1
, Σ
v
¢
(9)
y
k
= Cx
k
+ w
k
, w
k
∼ N
¡
0
n
w
×1
, Σ
w
¢
(10)
where f : R
n
x
→ R
n
x
is a real-valued non-linear function,
C ∈R
n
y
×n
x
is an observation matrix, and v
k
and w
k
are mu-
tually independent i.i.d. Gaussian sequences with Σ
v
> 0 and
Σ
w
> 0; C, Σ
v
and Σ
w
are assumed known. Defining
Σ
−1
= Σ
−1
v
+ C
t
Σ
−1
w
C (11)
m
k
= Σ
¡
Σ
−1
v
f (x
k−1
) + C
t
Σ
−1
w
y
k
¢
(12)
one obtains
x
k
| (x
k−1
, y
k
) ∼ N (m
k
, Σ) (13)
剩余11页未读,继续阅读


















安全验证
文档复制为VIP权益,开通VIP直接复制

评论3