卡尔曼kalman经典论文_卡尔曼写的关于能控的论文

kalman

论文

5星 · 超过95%的资源需积分: 50 187 浏览量更新于2023-03-03 评论收藏 167KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

Introduction

AN IMPORTANT class of theoretical and practical

problems in communication and control is of a statistical nature.

Such problems are: (i) Prediction of random signals; (ii) separa-

tion of random signals from random noise; (iii) detection of

signals of known form (pulses, sinusoids) in the presence of

random noise.

In his pioneering work, Wiener [1]

showed that problems (i)

and (ii) lead to the so-called Wiener-Hopf integral equation; he

also gave a method (spectral factorization) for the solution of this

integral equation in the practically important special case of

stationary statistics and rational spectra.

Many extensions and generalizations followed Wiener’s basic

work. Zadeh and Ragazzini solved the finite-memory case [2].

Concurrently and independently of Bode and Shannon [3], they

also gave a simplified method [2] of solution. Booton discussed

the nonstationary Wiener-Hopf equation [4]. These results are

now in standard texts [5-6]. A somewhat different approach along

these main lines has been given recently by Darlington [7]. For

extensions to sampled signals, see, e.g., Franklin [8], Lees [9].

Another approach based on the eigenfunctions of the Wiener-

Hopf equation (which applies also to nonstationary problems

whereas the preceding methods in general don’t), has been

pioneered by Davis [10] and applied by many others, e.g.,

Shinbrot [11], Blum [12], Pugachev [13], Solodovnikov [14].

In all these works, the objective is to obtain the specification of

a linear dynamic system (Wiener filter) which accomplishes the

prediction, separation, or detection of a random signal.

———

This research was supported in part by the U. S. Air Force Office of

Scientific Research under Contract AF 49 (638)-382.

7212 Bellona Ave.

Numbers in brackets designate References at end of paper.

Of course, in general these tasks may be done better by nonlinear

filters. At present, however, little or nothing is known about how to obtain

(both theoretically and practically) these nonlinear filters.

Contributed by the Instruments and Regulators Division and presented

at the Instruments and Regulators Conference, March 29– Apri1 2, 1959,

of T

HE AMERICAN SOCIETY OF MECHANICAL ENGINEERS.

OTE: Statements and opinions advanced in papers are to be understood

as individual expressions of their authors and not those of the Society.

Manuscript received at ASME Headquarters, February 24, 1959. Paper

No. 59—IRD-11.

Present methods for solving the Wiener problem are subject to

a number of limitations which seriously curtail their practical

usefulness:

(1) The optimal filter is specified by its impulse response. It is

not a simple task to synthesize the filter from such data.

(2) Numerical determination of the optimal impulse response is

often quite involved and poorly suited to machine computation.

The situation gets rapidly worse with increasing complexity of

the problem.

(3) Important generalizations (e.g., growing-memory filters,

nonstationary prediction) require new derivations, frequently of

considerable difficulty to the nonspecialist.

(4) The mathematics of the derivations are not transparent.

Fundamental assumptions and their consequences tend to be

obscured.

This paper introduces a new look at this whole assemblage of

problems, sidestepping the difficulties just mentioned. The

following are the highlights of the paper:

(5) Optimal Estimates and Orthogonal Projections. The

Wiener problem is approached from the point of view of condi-

tional distributions and expectations. In this way, basic facts of

the Wiener theory are quickly obtained; the scope of the results

and the fundamental assumptions appear clearly. It is seen that all

statistical calculations and results are based on first and second

order averages; no other statistical data are needed. Thus

difficulty (4) is eliminated. This method is well known in

probability theory (see pp. 75–78 and 148–155 of Doob [15] and

pp. 455–464 of Loève [16]) but has not yet been used extensively

in engineering.

(6) Models for Random Processes. Following, in particular,

Bode and Shannon [3], arbitrary random signals are represented

(up to second order average statistical properties) as the output of

a linear dynamic system excited by independent or uncorrelated

random signals (“white noise”). This is a standard trick in the

engineering applications of the Wiener theory [2–7]. The

approach taken here differs from the conventional one only in the

way in which linear dynamic systems are described. We shall

emphasize the concepts of state and state transition; in other

words, linear systems will be specified by systems of first-order

difference (or differential) equations. This point of view is

A New Approach to Linear Filtering

and Prediction Problems

The classical filtering and prediction problem is re-examined using the Bode-

Shannon representation of random processes and the “state transition” method of

analysis of dynamic systems. New results are:

(1) The formulation and methods of solution of the problem apply without modifica-

tion to stationary and nonstationary statistics and to growing-memory and infinite-

memory filters.

(2) A nonlinear difference (or differential) equation is derived for the covariance

matrix of the optimal estimation error. From the solution of this equation the co-

efficients of the difference (or differential) equation of the optimal linear filter are ob-

tained without further calculations.

(3) The filtering problem is shown to be the dual of the noise-free regulator problem.

The new method developed here is applied to two well-known problems, confirming

and extending earlier results.

The discussion is largely self-contained and proceeds from first principles; basic

concepts of the theory of random processes are reviewed in the Appendix.

R. E. KALMAN

Research Institute for Advanced Study,

Baltimore, Md.

natural and also necessary in order to take advantage of the

simplifications mentioned under (5).

(7) Solution of the Wiener Problem. With the state-transition

method, a single derivation covers a large variety of problems:

growing and infinite memory filters, stationary and nonstationary

statistics, etc.; difficulty (3) disappears. Having guessed the

“state” of the estimation (i.e., filtering or prediction) problem

correctly, one is led to a nonlinear difference (or differential)

equation for the covariance matrix of the optimal estimation error.

This is vaguely analogous to the Wiener-Hopf equation. Solution

of the equation for the covariance matrix starts at the time t

when

the first observation is taken; at each later time t the solution of

the equation represents the covariance of the optimal prediction

error given observations in the interval (t

, t). From the covariance

matrix at time t we obtain at once, without further calculations,

the coefficients (in general, time-varying) characterizing the

optimal linear filter.

(8) The Dual Problem. The new formulation of the Wiener

problem brings it into contact with the growing new theory of

control systems based on the “state” point of view [17–24]. It

turns out, surprisingly, that the Wiener problem is the dual of the

noise-free optimal regulator problem, which has been solved

previously by the author, using the state-transition method to great

advantage [18, 23, 24]. The mathematical background of the two

problems is identical—this has been suspected all along, but until

now the analogies have never been made explicit.

(9) Applications. The power of the new method is most ap-

parent in theoretical investigations and in numerical answers to

complex practical problems. In the latter case, it is best to resort to

machine computation. Examples of this type will be discussed

later. To provide some feel for applications, two standard

examples from nonstationary prediction are included; in these

cases the solution of the nonlinear difference equation mentioned

under (7) above can be obtained even in closed form.

For easy reference, the main results are displayed in the form of

theorems. Only Theorems 3 and 4 are original. The next section

and the Appendix serve mainly to review well-known material in

a form suitable for the present purposes.

Notation Conventions

Throughout the paper, we shall deal mainly with discrete (or

sampled) dynamic systems; in other words, signals will be ob-

served at equally spaced points in time (sampling instants). By

suitable choice of the time scale, the constant intervals between

successive sampling instants (sampling periods) may be chosen as

unity. Thus variables referring to time, such as t, t

, T will

always be integers. The restriction to discrete dynamic systems is

not at all essential (at least from the engineering point of view);

by using the discreteness, however, we can keep the mathematics

rigorous and yet elementary. Vectors will be denoted by small

bold-face letters: a, b, ..., u, x, y, ... A vector or more precisely an

n-vector is a set of n numbers x

, ... x

; the x

are the co-ordinates

or components of the vector x.

Matrices will be denoted by capital bold-face letters: A, B, Q,

Φ, Ψ, …; they are m × n arrays of elements a

, b

, q

,... The

transpose (interchanging rows and columns) of a matrix will be

denoted by the prime. In manipulating formulas, it will be

convenient to regard a vector as a matrix with a single column.

Using the conventional definition of matrix multiplication, we

write the scalar product of two n-vectors x, y as

x'y =

∑

= y'x

The scalar product is clearly a scalar, i.e., not a vector, quantity.

Similarly, the quadratic form associated with the n × n matrix Q

is,

x'Qx =

∑

jiji

xqx

We define the expression xy' where x' is an m-vector and y is an

n-vector to be the m × n matrix with elements x

We write E(x) = Ex for the expected value of the random vec-

tor x (see Appendix). It is usually convenient to omit the brackets

after E. This does not result in confusion in simple cases since

constants and the operator E commute. Thus Exy' = matrix with

elements E(x

); ExEy' = matrix with elements E(x

)E(y

For ease of reference, a list of the principal symbols used is

given below.

Optimal Estimates

t time in general, present time.

time at which observations start.

(t), x

(t) basic random variables.

y(t) observed random variable.

*(t

|t) optimal estimate of x

) given y(t

), …, y(t).

L loss function (non random function of its argument).

ε estimation error (random variable).

Orthogonal Projections

Y(t) linear manifold generated by the random variables

y(t

), …, y(t).

x (t

|t) orthogonal projection of x(t

) on Y(t).

|t) component of x(t

) orthogonal to Y(t).

Models for Random Processes

Φ(t + 1; t) transition matrix

Q(t) covariance of random excitation

Solution of the Wiener Problem

x(t) basic random variable.

y(t) observed random variable.

Y(t) linear manifold generated by y(t

), …, y(t).

Z(t)

linear manifold generated by y

(t|t – 1).

x*(t

|t)

optimal estimate of x(t

) given Y(t).

|t) error in optimal estimate of x(t

) given Y(t).

Optimal Estimates

To have a concrete description or the type of problems to be

studied, consider the following situation. We are given signal

(t) and noise x

(t). Only the sum y(t) = x

(t) + x

(t) can be ob-

served. Suppose we have observed and know exactly the values

of y(t

), ..., y(t). What can we infer from this knowledge in regard

to the (unobservable) value of the signal at t = t

, where t

may be

less than, equal to, or greater than t? If t

< t, this is a data-

smoothing (interpolation) problem. If t

= t, this is called

filtering. If t

> t, we have a prediction problem. Since our treat-

ment will be general enough to include these and similar

problems, we shall use hereafter the collective term estimation.

As was pointed out by Wiener [1], the natural setting of the

estimation problem belongs to the realm of probability theory and

statistics. Thus signal, noise, and their sum will be random

variables, and consequently they may be regarded as random

processes. From the probabilistic description of the random

processes we can determine the probability with which a par-

ticular sample of the signal and noise will occur. For any given

set of measured values

), ...,

(t) of the random variable y(t)

one can then also determine, in principle, the probability of

simultaneous occurrence of various values ξ

(t) of the random

variable x

). This is the conditional probability distribution

function

Pr[x

) ≤ ξ

|y(t

) =

), …, y(t) =

(t)] = F(ξ

) (1)

Evidently, F(ξ

) represents all the information which the meas-

urement of the random variables y(t

), ..., y(t) has conveyed about

the random variable x

). Any statistical estimate of the random

variable x

) will be some function of this distribution and

therefore a (nonrandom) function of the random variables y(t

), ...,

y(t). This statistical estimate is denoted by X

|t), or by just X

)

or X

when the set of observed random variables or the time at

which the estimate is required are clear from context.

Suppose now that X

is given as a fixed function of the random

variables y(t

), ..., y(t). Then X

is itself a random variable and its

actual value is known whenever the actual values of y(t

), ..., y(t)

are known. In general, the actual value of X

) will be different

from the (unknown) actual value of x

). To arrive at a rational

way of determining X

, it is natural to assign a penalty or loss for

incorrect estimates. Clearly, the loss should be a (i) positive, (ii)

nondecreasing function of the estimation error ε = x

) – X

Thus we define a loss function by

L(0) = 0

L(ε

) ≥ L(ε

) ≥ 0 when ε

≥ ε

≥ 0 (2)

L(ε) = L(–ε)

Some common examples of loss functions are: L(ε) = aε

, aε

a|ε|, a[1 – exp(–ε

)], etc., where a is a positive constant.

One (but by no means the only) natural way of choosing the

random variable X

is to require that this choice should minimize

the average loss or risk

E{L[x

) – X

)]} = E[E{L[x(t

) – X

)]|y(t

), …, y(t)}] (3)

Since the first expectation on the right-hand side of (3) does not

depend on the choice of X

but only on y(t

), ..., y(t), it is clear that

minimizing (3) is equivalent to minimizing

E{L[x

) – X

)]|y(t

), ..., y(t)} (4)

Under just slight additional assumptions, optimal estimates can be

characterized in a simple way.

Theorem 1. Assume that L is of type (2) and that the conditional

distribution function F(ξ) defined by (1) is:

(A) symmetric about the mean

ξ :

F(ξ –

ξ ) = 1 – F( ξ – ξ)

(B) convex for ξ ≤

ξ :

F(λξ

+ (1 – λ)ξ

) ≤ λF(ξ

) + (1 – λ)F(ξ

)

for all ξ

, ξ

≤ ξ and 0 ≤ λ ≤ 1

Then the random variable x

*(t

|t) which minimizes the average

loss (3) is the conditional expectation

*(t

|t) = E[x

)|y(t

), …, y(t)] (5)

Proof: As pointed out recently by Sherman [25], this theorem

follows immediately from a well-known lemma in probability

theory.

Corollary. If the random processes {x

(t)}, {x

(t)}, and {y(t)}

are gaussian, Theorem 1 holds.

Proof: By Theorem 5, (A) (see Appendix), conditional distribu-

tions on a gaussian random process are gaussian. Hence the re-

quirements of Theorem 1 are always satisfied.

In the control system literature, this theorem appears some-

times in a form which is more restrictive in one way and more

general in another way:

Theorem l-a. If L(ε) = ε

, then Theorem 1 is true without as-

sumptions (A) and (B).

Proof: Expand the conditional expectation (4):

E[x

)|y(t

), …, y(t)] – 2X

)E[x

)|y(t

), …, y(t)] + X

)

and differentiate with respect to X

). This is not a completely

rigorous argument; for a simple rigorous proof see Doob [15], pp.

77–78.

Remarks. (a) As far as the author is aware, it is not known what

is the most general class of random processes {x

(t)}, {x

(t)} for

which the conditional distribution function satisfies the re-

quirements of Theorem 1.

(b) Aside from the note of Sherman, Theorem 1 apparently has

never been stated explicitly in the control systems literature. In

fact, one finds many statements to the effect that loss functions of

the general type (2) cannot be conveniently handled mathe-

matically.

valued random variables. In that case, the estimation problem is

stated as: Given a vector-valued random process {x(t)} and ob-

served random variables y(t

), ..., y(t), where y(t) = Mx(t) (M

being a singular matrix; in other words, not all co-ordinates of

x(t) can be observed), find an estimate X(t

) which minimizes the

expected loss E[L(||x(t

) – X(t

)||)], || || being the norm of a

vector.

Theorem 1 remains true in the vector case also, provided we

re- quire that the conditional distribution function of the n co-

ordi- nates of the vector x(t

Pr[x

) ≤ ξ

,…, x

) ≤ ξ

|y(t

), …, y(t)] = F(ξ

, …,ξ

)

be symmetric with respect to the n variables ξ

– ξ

, …, ξ

– ξ

and convex in the region where all of these variables are

negative.

Orthogonal Projections

The explicit calculation of the optimal estimate as a function of

the observed variables is, in general, impossible. There is an

important exception: The processes {x

(t)}, {x

(t)} are gaussian.

On the other hand, if we attempt to get an optimal estimate

under the restriction L(ε) = ε

and the additional requirement that

the estimate be a linear function of the observed random

variables, we get an estimate which is identical with the optimal

estimate in the gaussian case, without the assumption of linearity

or quadratic loss function. This shows that results obtainable by

linear estimation can be bettered by nonlinear estimation only

when (i) the random processes are nongaussian and even then (in

view of Theorem 5, (C)) only (ii) by considering at least third-

order probability distribution functions.

In the special cases just mentioned, the explicit solution of the

estimation problem is most easily understood with the help of a

geometric picture. This is the subject of the present section.

Consider the (real-valued) random variables y(t

), …, y(t). The

set of all linear combinations of these random variables with real

coefficients

∑

iya

)( (6)

forms a vector space (linear manifold) which we denote by

Y(t).

We regard, abstractly, any expression of the form (6) as “point”

or “vector” in

Y(t); this use of the word “vector” should not be

confused, of course, with “vector-valued” random variables, etc.

Since we do not want to fix the value of t (i.e., the total number

of possible observations),

Y(t) should be regarded as a finite-

dimensional subspace of the space of all possible observations.

剩余11页未读，继续阅读

xiucailing

2013-04-19

谢谢楼主，此论文内容通俗，便于理解。

liaoyulei

粉丝: 3
资源: 17

会员权益专享

卡尔曼 kalman 经典论文

评论5

会员权益专享

最新资源

卡尔曼 kalman 经典论文

评论5

卡尔曼滤波，1960.Kalman filter, 第一篇文章

Kalman1960原始论文

Kalman 滤波经典论文

卡尔曼滤波算法matlab 严恭敏

如何实现自适应卡尔曼滤波

ros卡尔曼滤波组合导航

金瑶 蔡之华 卡尔曼滤波 ar

讲一下这篇论文：Strategies to Inject Spoofed Measurement Data to Mislead Kalman Filter r

自适应卡尔曼滤波AKF代码

用matlab对非线性加速度，角速度进行卡尔曼滤波，并举个例子

多响应时间序列预测模型

滤波算法 python

msckf mono源码框架

KalmanFilter几本带代码的书和论文

卡尔曼滤波论文

Kalman滤波论文

A4打印模板-画图设计设计师产品草稿图纸-网格纸A4打印模板高清待办练字模板PDF下载.pdf

ISA-95 流程圣经，描述了PLM企业资源计划、MES制造执行系统、ERP企业资源计划系统、SCM供应链管理系统之间的关系

年会活动颁奖领奖音乐74首

这个项目是用于个人参加浙江大学移动创新竞赛而使用。.zip

会员权益专享

最新资源

金瑶蔡之华卡尔曼滤波 ar