R
k
, and H
k
are assumed constant over time. Rewriting the
non-Gaussian measurement model Eq. (2) yields:
Std z
k
H
k
s
k
ðÞx
k
; R
k
; υjðÞ¼∫
∞
0
N z
k
H
k
s
k
ðÞx
k
; u
1
k
R
k
G u
k
υ=2
j
; υ=2ðÞdu
k
(3)
The dependence on the mode s
k
makes it different
from the other Student-t measurement models in [13–15].
Since the heavy-tailed Student-t distribution does not
neglect the existence of outliers, the non-Gaussian filtering
can proceed without setting other models for the outliers.
Unfortunately, there is no closed form solution to Eq. (3)
using conventional methods because u
k
has to be margi-
nalized out and the state and noise statistics are even
coupled. Therefore, direct use of Student-t in the Kalman
filter is intractable and we have to resort to approximation
techniques. In order to make this computation analytically
tractable, variational Bayesian methods are used to compute
the posterior distribution of the noise and the state. In
[6,7, 19,20], VB was proven to be able to handle such
coupled situations due to its properties, that is, factorizing
the coupled joint distribution into several conditionally inde-
pendent distributions. VB methods will be reviewed shortly.
III. VARIATIONAL BAYESIAN METHODS
By Bayes’ rule, the joint posterior distribution is
given as:
p ΘjYðÞ¼
p YjΘðÞp ΘðÞ
∫p YjΘðÞp ΘðÞdΘ
; (4)
where Y is the observed data set and Θ is a set of param-
eters. From the above, we find that direct computation of
Eq. (4) can only be achieved for very limited models be-
cause of the intractability in computing the integral, that
is, the marginal likelihood p(Y). The key idea of VB is to
use a new free-form distribution q(Θ) to approximate the
true posterior distribution p(Θ|Y). The form of the new distri-
bution q(Θ) is selected freely, due to the conjugacy property
(see Chapter 2 in [9]). Furthermore, if the set Θ can be
partitioned into N parts as Θ ={Θ
1
, Θ
2
, ⋯, Θ
i
, ⋯, Θ
N
} and
each Θ
i
is assumed independent of each other, naturally,
q(Θ) factorizes into N independent q(Θ
i
) s, then the joint
approximate distribution becomes
q Θ
1
; Θ
2
⋯; Θ
i
; ⋯; Θ
N
ðÞ¼∏
N
i¼1
q Θ
i
ðÞ (5)
The VB method using this technique is also called
(mean field) variational Bayes [6–8].
In VB framework, the goal is to find a q(Θ) that is as
close as to p(Θ) (dependence on Y is omitted for concise-
ness). Fortunately, Kullback–Leibler (KL) divergence is a
non-negative dissimilarity function measuring the dis-
crepancy between two distributions q(Θ) and p(Θ). Hence,
we can obtain q(Θ) by minimizing the KL divergence
between p(Θ) and q(Θ). It has also been proven in [9] that
minimizing KL(q(Θ)‖ p(Θ)) is equivalent to maximizing
the lower bound of ln p(Y). The lower bound function can
be written as:
F ¼ ∫q ΘðÞln
p YjΘðÞp ΘðÞ
q ΘðÞ
dΘ (6)
Maximizing F tightens the lower bound and gives
rise to the optimal distribution [9]. The approximate dis-
tribution is optimal in the sense of KL divergence because
KL(q(Θ)‖ p(Θ)) equals zero if and only if q(Θ)=p(Θ).
Please note that this optimal sense differs from its coun-
ter part in optimal estimation theory, where the optimiza-
tion is measured by minimu m-variance rul es. Therefore,
q(Θ
i
) can be computed by differentiating F with respect
to q(Θ
j ≠ i
). General solutions of q(Θ
i
) s are given as [8,9]:
q Θ
i
ðÞ¼
exp E
Θ
j ≠ i
ln p Y; ΘðÞ½
∫exp E
Θ
j ≠i
ln p Y; ΘðÞ½
dΘ
i
; (7)
where E
Θ
j≠i
ln p Y; ΘðÞ½is the expectation of the logarithm
of the joint distribution over the entire set of parameters
except for the i th. Hereby, each factorized distribution
can be obtained after mean field theory [9] is used. We
will see how this advantage of VB works out the state-
noise coupled problem in the following section.
It is worth noting that Eq. (7) is an implicit solution be-
cause of the circular dependencies. This forms an iterative
scheme of VB: given the appropriate initialization of the
hyper-parameters, the distribution of each parameter is in
turn estimated with the expectation over other distributions.
Each parameter’s distribution then is updated using Eq. (7)
for the next cycling step until the algorithm converges. The
convergence of VB has been guaranteed in [21].
IV. ROBUST IMM USING VB
In this section, an improved IMM approach, based on
the Student-t distribution, is derived. VB is used to recur-
sively estimate the noise statistics and the state of each
mode from the hybrid system. The state x
k
is augmented
as {x
k
, u
k
}. From Eqs (1) and (2), the prior forms of the state
and the latent variable are Gaussian and Gamma distribu-
tion, respectively. Hence, the distribution of the augmented
state of the j th mode conditioned on the measurements up
C. Shen et al.: A robust IMM Approach for State Estimation
© 2014 Chinese Automatic Control Society and Wiley Publishing Asia Pty Ltd