THE HIERARCHICAL HIDDEN MARKOV MODEL 43
set of states and vertical transitions induces a tree structure where the root state is the node
at the top of the hierarchy and the leaves are the production states. To simplify notation we
restrict our analysis to HHMMs with a full underlying tree structure, i.e., all the leaves are
at the same distance from the root state. The analysis of HHMMs with a general structure is
a straightforward generalization of the analysis presented here. The experiments described
in this paper were performed with a general topology.
We would like to note in passing that every HHMM can be represented as a standard
single level HMM. The states of the HMM are the production states of the corresponding
HHMM with a fully connected structure, i.e., there is a non-zero probability of moving from
any of the states to any other state. The equivalent HMM lacks, however, the multi-level
structure which we exploit in the applications described in Section 4.
We now give a formal description of an HHMM. Let Σ be a finite alphabet. We denote
by Σ
∗
the set of all possible strings over Σ. An observation sequence is a finite string from
Σ
∗
denoted by
¯
O = o
1
o
2
···o
T
. A state of an HHMM is denoted by q
d
i
(d ∈{1,...,D})
where i is the state index and d is the hierarchy index. The hierarchy index of the root is
1 and of the production states is D. The internal states need not have the same number
of substates. We therefore denote the number of substates of an internal state q
d
i
by |q
d
i
|.
Whenever it is clear from the context, we omit the state index and denote a state at level d
by q
d
. In addition to its model structure (topology), an HHMM is characterized by the state
transition probability between the internal states and the output distribution vector of the
production states. That is, for each internal state q
d
i
(d ∈{1,...,D−1}), there is a state
transition probability matrix denoted by A
q
d
=(a
q
d
ij
), where a
q
d
ij
= P (q
d+1
j
|q
d+1
i
) is the
probability of making a horizontal transition from the ith state to the jth, both of which are
substates of q
d
. Similarly, Π
q
d
= {π
q
d
(q
d+1
i
)} = {P (q
d+1
i
|q
d
)} is the initial distribution
vector over the substates of q
d
, which is the probability that state q
d
will initially activate
the state q
d+1
i
.Ifq
d+1
i
is in turn an internal state, then π
d
(q
d+1
i
) may also be interpreted
as the probability of making a vertical transition: entering substate q
d+1
i
from its parent
state q
d
. Each production state q
D
is solely parameterized by its output probability vector
B
q
D
= {b
q
D
(k)}, where b
q
D
(k)=P(σ
k
|q
D
)is the probability that the production state
q
D
will output the symbol σ
k
∈ Σ. The entire set of parameters is denoted by
λ = {λ
q
d
}
d∈{1,...,D}
= {{A
q
d
}
d∈{1,...,D−1}
, {Π
q
d
}
d∈{1,...,D−1}
, {B
q
D
}} .
An illustration of an HHMM with an arbitrary topology and parameters is given in Figure 1.
To summarize, a string is generated by starting from the root state and choosing one of
the root’s substates at random according to Π
q
1
. Similarly, for each internal state q that is
entered, one of q’s substates is randomly chosen according to q’s initial probability vector
Π
q
. The operation proceeds with the chosen substate which recursively activates one of
its substates. These recursive operations are carried out until a production state, q
D
,is
reached at which point a single symbol is generated according to a state output probability
vector, B
q
D
. Then control returns to the state activated q
D
. Upon the completion of a
recursive string generation, the internal state that started the recursion chooses the next
state in the same level according to the level’s state transition matrix and the newly chosen
state starts a new recursive string generation process. Each level (excluding the top) has
评论8