220 S.-Z. Yu / Artificial Intelligence 174 (2010) 215–243
P [o
1:T
|λ]=
j∈S
P [S
t
= j,o
1:T
|λ]=
j∈S
γ
t
( j),
for
d
, d
∈D,
j
∈S ,
i
∈S \{j} and
t
=1,...,T
, where
η
t
( j, d)/P [o
1:T
|λ] represents the probability of being in state
j
having
duration
d
by time
t
given the model and the observation sequence;
ξ
t
(i, d
; j,d)/P [o
1:T
|λ] the probability of transition at
time
t
from state
i
occurred with duration
d
to state
j
having duration
d
given the model and the observation sequence;
ξ
t
(i, j)/P [o
1:T
|λ] the probability of transition at time
t
from state
i
to state
j
given the model and the observation se-
quence;
γ
t
( j)/P [o
1:T
|λ] the probability of state
j
at time
t
given the model and the observation sequence; and
P
[o
1:T
|λ]
the probability that the observed sequence
o
1:T
is generated by the model λ. Obviously, the conditional factor
P
[o
1:T
|λ] is
common for all the posterior probabilities, which will be eliminated when the posterior probabilities are used in parameter
estimation. Therefore, it is often omitted for simplicity in the literature. Similarly, in the rest of this paper, we sometimes
will not explicitly mention this conditional factor in calculating the posterior probabilities by
η
t
( j, d), ξ
t
(i, d
; j,d), ξ
t
(i, j),
and
γ
t
( j).
In considering the following identity
P [S
t:t+1
= j,o
1:T
|λ]=P [S
t
= j,o
1:T
|λ]−P [S
t]
= j,o
1:T
|λ],
P [S
t:t+1
= j,o
1:T
|λ]=P [S
t+1
= j,o
1:T
|λ]−P [S
[t+1
= j,o
1:T
|λ]
we have a recursive formula for calculating γ
t
( j):
γ
t
( j) = γ
t+1
( j) + P [S
t]
= j,o
1:T
|λ]−P [S
[t+1
= j,o
1:T
|λ]=γ
t+1
( j) +
i∈S\{j}
ξ
t
( j, i) −ξ
t
(i, j)
. (9)
Denote
P
[o
1:T
|λ] by
L
in the following expressions. Then using the forward and backward variables, one can compute
various expectations [60]:
(a) The expected number of times state
i
ends before
t
:
1
L
t
t
j∈S\{i}
ξ
t
(i, j); The expected number of times state
i
starts at
t
or before:
1
L
t
t−1
j∈S\{i}
ξ
t
( j, i).
(b) Expected total duration spent in state
i
:
1
L
t
γ
t
(i).
(c) Expected number of times that state
i
occurred with observation
o
t
= v
k
:
1
L
t
γ
t
(i)I(o
t
= v
k
), where the indicator
function
I(x) =1ifx is true and zero otherwise.
(d) Estimated average observable values of state i:
t
γ
t
(i)o
t
t
γ
t
(i)
.
(e) Probability that state i was the first state:
1
L
γ
1
(i).
(f) Expected total number of times state i commenced:
1
L
t
j∈S\{i}
ξ
t
( j, i) or terminated:
1
L
t
j∈S\{i}
ξ
t
(i, j).Forthe
simplifying assumption for the boundary conditions described in the last subsection, we have
T −1
t
=0
j∈S\{i}
ξ
t
( j, i) =
T
t
=1
j∈S\{i}
ξ
t
(i, j).
(g) Estimated average duration of state i:
t
d
η
t
(i,d)d
t
d
η
t
(i,d)
.
2.2.3. MAP and MLE estimate of states
The maximum a posteriori (MAP) estimate of state S
t
given a specific observation sequence o
1:T
can be obtained [60] by
maximizing
γ
t
( j) given by (8), i.e.,
ˆ
s
t
=arg max
i∈S
γ
t
(i)
.
If we choose η
t
(i, d) of (6), instead of γ
t
(i), as the MAP criterion, we obtain the joint MAP estimate of the state that ends
at time t and the duration of this state, when a specific sequence o
1:T
is observed:
(
ˆ
s
t
,
ˆ
d
t
) =arg max
(i,d)
η
t
(i, d). (10)
Viterbi algorithms are the most popular dynamic programming algorithms for the maximum likelihood estimate (MLE)
of state sequence of HMMs. There exist the similar algorithms for the HSMM [115,151,35,26]. Define the forward variable
for the extended Viterbi algorithm by
δ
t
( j,d) max
s
1:t−d
P [s
1:t−d
, S
[t−d+1:t]
= j,o
1:t
|λ]= max
i∈S\{j}, d
∈D
δ
t−d
i, d
a
(i,d
)( j,d)
b
j,d
(o
t−d+1:t
)
, (11)
for 1 t T , j ∈ S, d ∈ D. δ
t
( j, d) represents the maximum likelihood that the partial state sequence ends at t in state
j of duration d. Record the previous state that
δ
t
( j, d) selects by Ψ(t, j,d) (t −d, i
∗
, d
∗
), where i
∗
is the previous state
survived, d
∗
its duration, and (t −d) its ending time. Ψ(t, j, d) is determined by letting