The standard approach is to keep adding hidden nodes one at a time, performing structure learning at each
step, until the score drops. There has been some recent work on more intelligent heuristics. For example,
dense clique-like graphs (such as that in Figure 9(b)) suggest that a hidden node should be added [ELFK00].
5 Decision making under uncertainty
It is sometimes said that “Decision Theory = Probability Theory + Utility Theory” [DW91, RN95]. We
have outlined above how we can model joint probability distributions in a compact way by using sparse
graphs to reflect conditional independence relationships. It is also possible to decompose multi-attribute
utility functions in a similar way. Let the global utility be a sum of local utilities, U =
P
n
i=1
U
i
. We create
a node for each U
i
term, which has as parents all the attributes (random variables) on which it depends;
typically, the utility nodes will also have action (control) nodes as parents, since the utility depends both on
the state of the world and the action we perform. The resulting graph is called an influence diagram. We
can then use algorithms, similar to the inference algorithms discussed in Section 3, to compute the optimal
(sequence of) action(s) to perform so as to maximimize expected utility [CDLS99]. (See [KLS01] for a recent
application of this to multi-person game theory.)
In sequential decision theory, the agent (decision maker) is assumed to be interacting with the environment
which is modelled as a dynamical system (see Figure 5(a)). If this dynamical system is linear with Gaussian
noise, and the utility function is negative quadratic loss
4
, then techniques from control theory can be used to
compute the optimal policy, i.e., mapping from percepts to actions. If the system is non-linear, the standard
approach is to locally linearize the system.
Linear dynamical systems (LDSs) enjoy the separation property, which states that the optimal behavior can
be obtained by first doing state estimation (i.e., infer the hidden states), and then using the expected value
of the hidden states as input to a regular LQG (linear quadratic Gaussian) controller. In general, however,
the separation property does not hold. For example, consider controlling a mobile robot. The optimal action
should take into account the robot’s uncertainty about the state of the world (e.g., its location), and not
just use the mode of the posterior as if it were the true state of the world. The latter strategy (which is
called a certainty-equivalent controller) would never perform information-gathering moves. In other words,
the robot might not choose to look before it leapt, no matter how uncertain it was.
In general, finding good controllers for non-linear, partially observed systems, usually with unknown parame-
ters, is extremely challenging. One approach that shows some promise is reinforcement learning. Most of the
work has been on systems which are fully observed (e.g., [SB98]), but there has been some work on partially
observed systems (e.g., [KLC98]). Recent policy search methods (e.g., [NJ00]) show particular promise.
6 Applications
Special cases of Bayes nets were independently invented by many different communities many years ago,
e.g., genetics (linkage analysis), speech recognition (HMMs), tracking (Kalman fitering), data compression
(density estimation), channel coding (turbocodes), etc.
The general framework was developed by Pearl [Pea88] and various European researchers [JLO90, CDLS99],
who used it to make probabilistic expert systems. Many of these systems were used for medical diagnosis.
4
For example, consider a missile tracking an airplane, where the goal is to minimize the squared distance between itself and
the target.
16