simultaneous occurrence of various values ξ1(t) of the random The scalar product is clearly a
scalar, i.e., not a vector, quantity. variable x1(t1). This is the conditional probability distribution
function Transactions of the ASME–Journal of Basic Engineering, 82 (Series D): 35-45.
Copyright ? 1960 by ASME Pr[x1(t1) ≤ ξ1|y(t0) = η(t0), …, y(t) = η(t)] = F(ξ1) (1) Theorem
l-a. If L(ε) = ε2, then Theorem 1 is true without as-sumptions (A) and (B). Evidently, F(ξ1)
represents all the information which the meas- urement of the random variables y(tProof: Expand
the conditional expectation (4): 0), ..., y(t) has conveyed about the random variable x1(t1). Any
statistical estimate of the random E[x2(t2(tvariable x11)|y(t0), …, y(t)] – 2X1(t1)E[x1(t1)|y(t0),
…, y(t)] + X11) 1(t1) will be some function of this distribution and therefore a (nonrandom)
function of the random variables y(t0), ..., and differentiate with respect to X1(t1). This is not a
completely y(t). This statistical estimate is denoted by X1(t1|t), or by just X1(t1) rigorous
argument; for a simple rigorous proof see Doob [15], pp. or X1 when the set of observed random
variables or the time at 77–78. which the estimate is required are clear from context. Remarks. (a)
As far as the author is aware, it is not known what Suppose now that X1 is given as a fixed
function of the random is the most general class of random processes {xvariables y(t1(t)}, {x2(t)}
for 0), ..., y(t). Then X1 is itself a random variable and its which the conditional distribution
function satisfies the re- actual value is known whenever the actual values of y(t0), ..., y(t)
quirements of Theorem 1. are known. In general, the actual value of X1(t1) will be different (b)
Aside from the note of Sherman, Theorem 1 apparently has from the (unknown) actual value of
x1(t1). To arrive at a rational never been stated explicitly in the control systems literature. In way
of determining X1, it is natural to assign a penalty or loss for fact, one finds many statements to
the effect that loss functions of incorrect estimates. Clearly, the loss should be a (i) positive, (ii)
the general type (2) cannot be conveniently handled mathe- nondecreasing function of the
estimation error ε = x1(t1) – X1(t1). matically. Thus we define a loss function by (c) In the
sequel, we shall be dealing mainly with vector- valued random variables. In that case, the
estimation problem is L(0) = 0 stated as: Given a vector-valued random process {x(t)} and ob-
served random variables y(tL(ε0), ..., y(t), where y(t) = Mx(t) (M 2) ≥ L(ε1) ≥ 0 when ε2 ≥
ε1 ≥ 0 (2) being a singular matrix; in other words, not all co-ordinates of x(t) can be
observed), find an estimate X(tL(ε) = L(–ε) 1) which minimizes the expected loss E[L(||x(t1) –
X(t1)||)], || || being the norm of a vector. Some common examples of loss functions are: L(ε) =
aε2, aε4, Theorem 1 remains true in the vector case also, provided we a|ε|, a[1 – exp(–ε2)], etc.,
where a is a positive constant. re- quire that the conditional distribution function of the n co-One
(but by no means the only) natural way of choosing the ordi- nates of the vector x(trandom
variable X1), 1 is to require that this choice should minimize the average loss or risk Pr[x1(t1) ≤
ξ1,…, xn(t1) ≤ ξn|y(t0), …, y(t)] = F(ξ1, …,ξn) E{L[xbe symmetric with respect to the n
variables ξ1 – ξ1, …, ξn – ξn 1(t1) – X1(t1)]} = E[E{L[x(t1) – X1(t1)]|y(t0), …, y(t)}] (3) and
convex in the region where all of these variables are Since the first expectation on the right-hand
side of (3) does not negative. depend on the choice of X 1 but only on y(t0), ..., y(t), it is clear that
minimizing (3) is equivalent to minimizing Orthogonal Projections E{L[x1(t1) – X1(t1)]|y(t0), ...,
y(t)} (4) The explicit calculation of the optimal estimate as a function of Under just
slight additional assumptions, optimal estimates can be the observed variables is, in general,
impossible. There is an characterized in a simple way. important exception: The processes {x1(t)},
{x2(t)} are gaussian. Theorem 1. Assume that L is of type (2) and that the conditional On the other
hand, if we attempt to get an optimal estimate distribution function F(ξ) defined by (1) is: under