没有合适的资源?快使用搜索试试~ 我知道了~
首页卡尔曼 kalman 经典论文
资源详情
资源评论
资源推荐
Introduction
AN IMPORTANT class of theoretical and practical
problems in communication and control is of a statistical nature.
Such problems are: (i) Prediction of random signals; (ii) separa-
tion of random signals from random noise; (iii) detection of
signals of known form (pulses, sinusoids) in the presence of
random noise.
In his pioneering work, Wiener [1]
3
showed that problems (i)
and (ii) lead to the so-called Wiener-Hopf integral equation; he
also gave a method (spectral factorization) for the solution of this
integral equation in the practically important special case of
stationary statistics and rational spectra.
Many extensions and generalizations followed Wiener’s basic
work. Zadeh and Ragazzini solved the finite-memory case [2].
Concurrently and independently of Bode and Shannon [3], they
also gave a simplified method [2] of solution. Booton discussed
the nonstationary Wiener-Hopf equation [4]. These results are
now in standard texts [5-6]. A somewhat different approach along
these main lines has been given recently by Darlington [7]. For
extensions to sampled signals, see, e.g., Franklin [8], Lees [9].
Another approach based on the eigenfunctions of the Wiener-
Hopf equation (which applies also to nonstationary problems
whereas the preceding methods in general don’t), has been
pioneered by Davis [10] and applied by many others, e.g.,
Shinbrot [11], Blum [12], Pugachev [13], Solodovnikov [14].
In all these works, the objective is to obtain the specification of
a linear dynamic system (Wiener filter) which accomplishes the
prediction, separation, or detection of a random signal.
4
———
1
This research was supported in part by the U. S. Air Force Office of
Scientific Research under Contract AF 49 (638)-382.
2
7212 Bellona Ave.
3
Numbers in brackets designate References at end of paper.
4
Of course, in general these tasks may be done better by nonlinear
filters. At present, however, little or nothing is known about how to obtain
(both theoretically and practically) these nonlinear filters.
Contributed by the Instruments and Regulators Division and presented
at the Instruments and Regulators Conference, March 29– Apri1 2, 1959,
of T
HE AMERICAN SOCIETY OF MECHANICAL ENGINEERS.
N
OTE: Statements and opinions advanced in papers are to be understood
as individual expressions of their authors and not those of the Society.
Manuscript received at ASME Headquarters, February 24, 1959. Paper
No. 59—IRD-11.
Present methods for solving the Wiener problem are subject to
a number of limitations which seriously curtail their practical
usefulness:
(1) The optimal filter is specified by its impulse response. It is
not a simple task to synthesize the filter from such data.
(2) Numerical determination of the optimal impulse response is
often quite involved and poorly suited to machine computation.
The situation gets rapidly worse with increasing complexity of
the problem.
(3) Important generalizations (e.g., growing-memory filters,
nonstationary prediction) require new derivations, frequently of
considerable difficulty to the nonspecialist.
(4) The mathematics of the derivations are not transparent.
Fundamental assumptions and their consequences tend to be
obscured.
This paper introduces a new look at this whole assemblage of
problems, sidestepping the difficulties just mentioned. The
following are the highlights of the paper:
(5) Optimal Estimates and Orthogonal Projections. The
Wiener problem is approached from the point of view of condi-
tional distributions and expectations. In this way, basic facts of
the Wiener theory are quickly obtained; the scope of the results
and the fundamental assumptions appear clearly. It is seen that all
statistical calculations and results are based on first and second
order averages; no other statistical data are needed. Thus
difficulty (4) is eliminated. This method is well known in
probability theory (see pp. 75–78 and 148–155 of Doob [15] and
pp. 455–464 of Loève [16]) but has not yet been used extensively
in engineering.
(6) Models for Random Processes. Following, in particular,
Bode and Shannon [3], arbitrary random signals are represented
(up to second order average statistical properties) as the output of
a linear dynamic system excited by independent or uncorrelated
random signals (“white noise”). This is a standard trick in the
engineering applications of the Wiener theory [2–7]. The
approach taken here differs from the conventional one only in the
way in which linear dynamic systems are described. We shall
emphasize the concepts of state and state transition; in other
words, linear systems will be specified by systems of first-order
difference (or differential) equations. This point of view is
A New Approach to Linear Filtering
and Prediction Problems
1
The classical filtering and prediction problem is re-examined using the Bode-
Shannon representation of random processes and the “state transition” method of
analysis of dynamic systems. New results are:
(1) The formulation and methods of solution of the problem apply without modifica-
tion to stationary and nonstationary statistics and to growing-memory and infinite-
memory filters.
(2) A nonlinear difference (or differential) equation is derived for the covariance
matrix of the optimal estimation error. From the solution of this equation the co-
efficients of the difference (or differential) equation of the optimal linear filter are ob-
tained without further calculations.
(3) The filtering problem is shown to be the dual of the noise-free regulator problem.
The new method developed here is applied to two well-known problems, confirming
and extending earlier results.
The discussion is largely self-contained and proceeds from first principles; basic
concepts of the theory of random processes are reviewed in the Appendix.
R. E. KALMAN
Research Institute for Advanced Study,
2
Baltimore, Md.
Transactions of the ASME–Journal of Basic Engineering, 82 (Series D): 35-45. Copyright © 1960 by ASME
natural and also necessary in order to take advantage of the
simplifications mentioned under (5).
(7) Solution of the Wiener Problem. With the state-transition
method, a single derivation covers a large variety of problems:
growing and infinite memory filters, stationary and nonstationary
statistics, etc.; difficulty (3) disappears. Having guessed the
“state” of the estimation (i.e., filtering or prediction) problem
correctly, one is led to a nonlinear difference (or differential)
equation for the covariance matrix of the optimal estimation error.
This is vaguely analogous to the Wiener-Hopf equation. Solution
of the equation for the covariance matrix starts at the time t
0
when
the first observation is taken; at each later time t the solution of
the equation represents the covariance of the optimal prediction
error given observations in the interval (t
0
, t). From the covariance
matrix at time t we obtain at once, without further calculations,
the coefficients (in general, time-varying) characterizing the
optimal linear filter.
(8) The Dual Problem. The new formulation of the Wiener
problem brings it into contact with the growing new theory of
control systems based on the “state” point of view [17–24]. It
turns out, surprisingly, that the Wiener problem is the dual of the
noise-free optimal regulator problem, which has been solved
previously by the author, using the state-transition method to great
advantage [18, 23, 24]. The mathematical background of the two
problems is identical—this has been suspected all along, but until
now the analogies have never been made explicit.
(9) Applications. The power of the new method is most ap-
parent in theoretical investigations and in numerical answers to
complex practical problems. In the latter case, it is best to resort to
machine computation. Examples of this type will be discussed
later. To provide some feel for applications, two standard
examples from nonstationary prediction are included; in these
cases the solution of the nonlinear difference equation mentioned
under (7) above can be obtained even in closed form.
For easy reference, the main results are displayed in the form of
theorems. Only Theorems 3 and 4 are original. The next section
and the Appendix serve mainly to review well-known material in
a form suitable for the present purposes.
Notation Conventions
Throughout the paper, we shall deal mainly with discrete (or
sampled) dynamic systems; in other words, signals will be ob-
served at equally spaced points in time (sampling instants). By
suitable choice of the time scale, the constant intervals between
successive sampling instants (sampling periods) may be chosen as
unity. Thus variables referring to time, such as t, t
0
,
τ
, T will
always be integers. The restriction to discrete dynamic systems is
not at all essential (at least from the engineering point of view);
by using the discreteness, however, we can keep the mathematics
rigorous and yet elementary. Vectors will be denoted by small
bold-face letters: a, b, ..., u, x, y, ... A vector or more precisely an
n-vector is a set of n numbers x
1
, ... x
n
; the x
i
are the co-ordinates
or components of the vector x.
Matrices will be denoted by capital bold-face letters: A, B, Q,
Φ, Ψ, …; they are m × n arrays of elements a
ij
, b
ij
, q
ij
,... The
transpose (interchanging rows and columns) of a matrix will be
denoted by the prime. In manipulating formulas, it will be
convenient to regard a vector as a matrix with a single column.
Using the conventional definition of matrix multiplication, we
write the scalar product of two n-vectors x, y as
x'y =
∑
=
n
i
ii
yx
1
= y'x
The scalar product is clearly a scalar, i.e., not a vector, quantity.
Similarly, the quadratic form associated with the n × n matrix Q
is,
x'Qx =
∑
=
n
ji
jiji
xqx
1,
We define the expression xy' where x' is an m-vector and y is an
n-vector to be the m × n matrix with elements x
i
y
j
.
We write E(x) = Ex for the expected value of the random vec-
tor x (see Appendix). It is usually convenient to omit the brackets
after E. This does not result in confusion in simple cases since
constants and the operator E commute. Thus Exy' = matrix with
elements E(x
i
y
j
); ExEy' = matrix with elements E(x
i
)E(y
j
).
For ease of reference, a list of the principal symbols used is
given below.
Optimal Estimates
t time in general, present time.
t
0
time at which observations start.
x
1
(t), x
2
(t) basic random variables.
y(t) observed random variable.
x
1
*(t
1
|t) optimal estimate of x
1
(t
1
) given y(t
0
), …, y(t).
L loss function (non random function of its argument).
ε estimation error (random variable).
Orthogonal Projections
Y(t) linear manifold generated by the random variables
y(t
0
), …, y(t).
x (t
1
|t) orthogonal projection of x(t
1
) on Y(t).
x
~
(t
1
|t) component of x(t
1
) orthogonal to Y(t).
Models for Random Processes
Φ(t + 1; t) transition matrix
Q(t) covariance of random excitation
Solution of the Wiener Problem
x(t) basic random variable.
y(t) observed random variable.
Y(t) linear manifold generated by y(t
0
), …, y(t).
Z(t)
linear manifold generated by y
~
(t|t – 1).
x*(t
1
|t)
optimal estimate of x(t
1
) given Y(t).
x
~
(t
1
|t) error in optimal estimate of x(t
1
) given Y(t).
Optimal Estimates
To have a concrete description or the type of problems to be
studied, consider the following situation. We are given signal
x
1
(t) and noise x
2
(t). Only the sum y(t) = x
1
(t) + x
2
(t) can be ob-
served. Suppose we have observed and know exactly the values
of y(t
0
), ..., y(t). What can we infer from this knowledge in regard
to the (unobservable) value of the signal at t = t
1
, where t
1
may be
less than, equal to, or greater than t? If t
1
< t, this is a data-
smoothing (interpolation) problem. If t
1
= t, this is called
filtering. If t
1
> t, we have a prediction problem. Since our treat-
ment will be general enough to include these and similar
problems, we shall use hereafter the collective term estimation.
As was pointed out by Wiener [1], the natural setting of the
estimation problem belongs to the realm of probability theory and
statistics. Thus signal, noise, and their sum will be random
variables, and consequently they may be regarded as random
processes. From the probabilistic description of the random
processes we can determine the probability with which a par-
ticular sample of the signal and noise will occur. For any given
set of measured values
η
(t
0
), ...,
η
(t) of the random variable y(t)
one can then also determine, in principle, the probability of
simultaneous occurrence of various values ξ
1
(t) of the random
variable x
1
(t
1
). This is the conditional probability distribution
function
Transactions of the ASME–Journal of Basic Engineering, 82 (Series D): 35-45. Copyright © 1960 by ASME
Pr[x
1
(t
1
) ≤ ξ
1
|y(t
0
) =
η
(t
0
), …, y(t) =
η
(t)] = F(ξ
1
) (1)
Evidently, F(ξ
1
) represents all the information which the meas-
urement of the random variables y(t
0
), ..., y(t) has conveyed about
the random variable x
1
(t
1
). Any statistical estimate of the random
variable x
1
(t
1
) will be some function of this distribution and
therefore a (nonrandom) function of the random variables y(t
0
), ...,
y(t). This statistical estimate is denoted by X
1
(t
1
|t), or by just X
1
(t
1
)
or X
1
when the set of observed random variables or the time at
which the estimate is required are clear from context.
Suppose now that X
1
is given as a fixed function of the random
variables y(t
0
), ..., y(t). Then X
1
is itself a random variable and its
actual value is known whenever the actual values of y(t
0
), ..., y(t)
are known. In general, the actual value of X
1
(t
1
) will be different
from the (unknown) actual value of x
1
(t
1
). To arrive at a rational
way of determining X
1
, it is natural to assign a penalty or loss for
incorrect estimates. Clearly, the loss should be a (i) positive, (ii)
nondecreasing function of the estimation error ε = x
1
(t
1
) – X
1
(t
1
).
Thus we define a loss function by
L(0) = 0
L(ε
2
) ≥ L(ε
1
) ≥ 0 when ε
2
≥ ε
1
≥ 0 (2)
L(ε) = L(–ε)
Some common examples of loss functions are: L(ε) = aε
2
, aε
4
,
a|ε|, a[1 – exp(–ε
2
)], etc., where a is a positive constant.
One (but by no means the only) natural way of choosing the
random variable X
1
is to require that this choice should minimize
the average loss or risk
E{L[x
1
(t
1
) – X
1
(t
1
)]} = E[E{L[x(t
1
) – X
1
(t
1
)]|y(t
0
), …, y(t)}] (3)
Since the first expectation on the right-hand side of (3) does not
depend on the choice of X
1
but only on y(t
0
), ..., y(t), it is clear that
minimizing (3) is equivalent to minimizing
E{L[x
1
(t
1
) – X
1
(t
1
)]|y(t
0
), ..., y(t)} (4)
Under just slight additional assumptions, optimal estimates can be
characterized in a simple way.
Theorem 1. Assume that L is of type (2) and that the conditional
distribution function F(ξ) defined by (1) is:
(A) symmetric about the mean
ξ :
F(ξ –
ξ ) = 1 – F( ξ – ξ)
(B) convex for ξ ≤
ξ :
F(λξ
1
+ (1 – λ)ξ
2
) ≤ λF(ξ
1
) + (1 – λ)F(ξ
2
)
for all ξ
1
, ξ
2
≤ ξ and 0 ≤ λ ≤ 1
Then the random variable x
1
*(t
1
|t) which minimizes the average
loss (3) is the conditional expectation
x
1
*(t
1
|t) = E[x
1
(t
1
)|y(t
0
), …, y(t)] (5)
Proof: As pointed out recently by Sherman [25], this theorem
follows immediately from a well-known lemma in probability
theory.
Corollary. If the random processes {x
1
(t)}, {x
2
(t)}, and {y(t)}
are gaussian, Theorem 1 holds.
Proof: By Theorem 5, (A) (see Appendix), conditional distribu-
tions on a gaussian random process are gaussian. Hence the re-
quirements of Theorem 1 are always satisfied.
In the control system literature, this theorem appears some-
times in a form which is more restrictive in one way and more
general in another way:
Theorem l-a. If L(ε) = ε
2
, then Theorem 1 is true without as-
sumptions (A) and (B).
Proof: Expand the conditional expectation (4):
E[x
1
2
(t
1
)|y(t
0
), …, y(t)] – 2X
1
(t
1
)E[x
1
(t
1
)|y(t
0
), …, y(t)] + X
1
2
(t
1
)
and differentiate with respect to X
1
(t
1
). This is not a completely
rigorous argument; for a simple rigorous proof see Doob [15], pp.
77–78.
Remarks. (a) As far as the author is aware, it is not known what
is the most general class of random processes {x
1
(t)}, {x
2
(t)} for
which the conditional distribution function satisfies the re-
quirements of Theorem 1.
(b) Aside from the note of Sherman, Theorem 1 apparently has
never been stated explicitly in the control systems literature. In
fact, one finds many statements to the effect that loss functions of
the general type (2) cannot be conveniently handled mathe-
matically.
(c) In the sequel, we shall be dealing mainly with vector-
valued random variables. In that case, the estimation problem is
stated as: Given a vector-valued random process {x(t)} and ob-
served random variables y(t
0
), ..., y(t), where y(t) = Mx(t) (M
being a singular matrix; in other words, not all co-ordinates of
x(t) can be observed), find an estimate X(t
1
) which minimizes the
expected loss E[L(||x(t
1
) – X(t
1
)||)], || || being the norm of a
vector.
Theorem 1 remains true in the vector case also, provided we
re- quire that the conditional distribution function of the n co-
ordi- nates of the vector x(t
1
),
Pr[x
1
(t
1
) ≤ ξ
1
,…, x
n
(t
1
) ≤ ξ
n
|y(t
0
), …, y(t)] = F(ξ
1
, …,ξ
n
)
be symmetric with respect to the n variables ξ
1
– ξ
1
, …, ξ
n
– ξ
n
and convex in the region where all of these variables are
negative.
Orthogonal Projections
The explicit calculation of the optimal estimate as a function of
the observed variables is, in general, impossible. There is an
important exception: The processes {x
1
(t)}, {x
2
(t)} are gaussian.
On the other hand, if we attempt to get an optimal estimate
under the restriction L(ε) = ε
2
and the additional requirement that
the estimate be a linear function of the observed random
variables, we get an estimate which is identical with the optimal
estimate in the gaussian case, without the assumption of linearity
or quadratic loss function. This shows that results obtainable by
linear estimation can be bettered by nonlinear estimation only
when (i) the random processes are nongaussian and even then (in
view of Theorem 5, (C)) only (ii) by considering at least third-
order probability distribution functions.
In the special cases just mentioned, the explicit solution of the
estimation problem is most easily understood with the help of a
geometric picture. This is the subject of the present section.
Consider the (real-valued) random variables y(t
0
), …, y(t). The
set of all linear combinations of these random variables with real
coefficients
∑
=
t
ti
i
iya
0
)( (6)
forms a vector space (linear manifold) which we denote by
Y(t).
We regard, abstractly, any expression of the form (6) as “point”
or “vector” in
Y(t); this use of the word “vector” should not be
confused, of course, with “vector-valued” random variables, etc.
Since we do not want to fix the value of t (i.e., the total number
of possible observations),
Y(t) should be regarded as a finite-
dimensional subspace of the space of all possible observations.
Transactions of the ASME–Journal of Basic Engineering, 82 (Series D): 35-45. Copyright © 1960 by ASME
剩余11页未读,继续阅读
liaoyulei
- 粉丝: 3
- 资源: 17
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- stc12c5a60s2 例程
- Android通过全局变量传递数据
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论5