Hidden Markov Models with Discrete Infinite
Logistic Normal Distribution Priors
Hao Zhu, Jinsong Hu
Department of Automation
Chongqing University of Posts and Telecommunications
Chongqing, P. R. China 400065
Email: haozhu1982@gmail.com
Henry Leung
Department of Electrical and Computer Engineering
University of Calgary
Calgary, Alberta, Canada T2N 1N4
Abstract—In this article, we propose a discrete infinite logistic
normal distribution (DILN) to estimate the number of states in
a hidden Markov model (HMM). The HMM with the DILN
priors (DILN-HMM) allows for infinite state support and model
correlations between state transition probabilities. A variational
Bayesian (VB) framework is proposed to infer the posterior
distribution of the parameters of DILN-HMM. Experiments
based on synthetic and real data show that the DILN-HMM
is effective in handling situations where state transition matrix
is correlated.
Index Terms—Hidden Markov Model (HMM), hierarchical
Bayesian modeling, correlation structure, variational Bayesian
(VB).
I. INTRODUCTION
Hidden Markov model (HMM) is a popular model for
sequential data, and is widely used in many fields including
speech recognition, machine vision, bioinformatics and finance
[1]–[5]. The HMM is usually trained by the maximum-
likelihood Baum-Welch algorithm [6], in which the number of
states is preset. If the number of states is not selected properly,
the parameters will be overestimated or underestimated, and
affect the generalization ability of the model.
Recently, the hierarchical Dirichlet Process (HDP) is applied
to HMM, and it called nonparametric HMM. It defines a
prior distribution on transition matrices over countably infinite
state spaces [7]. The HDP-HMM leads to data-driven learning
algorithms which infer posterior distributions over the number
of states. But a lack of conjugacy between two levels of the
Dirichlet process means that there is a lack of a fast inference
algorithm. To tackle this issue, a stick-breaking HMM is
proposed to obtain a fully conjugate prior for an infinite-
state HMM that has a variational solution [8]. The HDP-
based approach has been used in various applications [9]–[12].
One drawback is that it assumes that the state transitions are
independent. Hence, it does not have the capability to model a
correlated state transition matrix. For example, when HMM is
applied to speech recognition where each state corresponds to
a typical sound. Consider the English sound t, there are only
a few sounds that can follow: train, taste, top, etc. But the
sound s almost never comes after t. In other words, the state
corresponding to s should be negatively correlated with the
state t and so the probability of transitioning from t to s should
be small, but transitioning from t to r should be relatively
higher because there sounds are positively correlated.
To address this correlated issue, we propose applying the
discrete infinite logistic normal distribution (DILN) to HMM,
which leads to an infinite-state HMM and models the correla-
tions between state transition probabilities. The DILN is a new
Bayesian nonparametric prior for mixed-membership models.
The main idea behind DILN is that each component is located
in a latent space, and the correlation structure between them
is determined by the distance between their locations. The
DILN can be defined as a scaled HDP, where the scaling is
determined by an exponentiated Gaussian process (GP) whose
kernel is a function of the latent distance matrix between
component locations [13].
The paper is organized as follows. In Section. II, a brief
review of the HMM is given. In Section. III, the proposed
HMM with DILN priors (DILN-HMM) is formulated and
the updating of posterior distributions of parameters in the
DILN-HMM based on variational Bayesian (VB) is presented.
Experimental results using synthetic and real data are given in
Section. IV to validate the performance of the proposed DILN-
HMM. Conclusions are given in Section. V.
II. H
IDDEN MARKOV MODEL
For a sequence of observations x =(x
1
,x
2
, ..., x
T
),an
HMM assumes that the observation x
t
at time t is gener-
ated by an underlying, discrete state s
t
and that the state
sequence s =(s
1
,s
2
, ..., s
T
) follows a first-order Markov
process p(s
t
|s
t−1
, ..., s
1
)=p(s
t
|s
t−1
). The discrete case is
considered here, x
t
∈{1, 2, ..., M} and s
t
∈{1, 2, ..., I},
where M is the alphabet size and I is the number of states.
Therefore, an HMM can be described as θ = {A, B, π},
where A, B, π are defined as follows
A = {a
ij
},a
ij
= p(s
t+1
= j |s
t
= i): state transition
probabilities
B = {b
im
},b
im
= p(x
t
= m |s
t
= i): emission probabili-
ties
π = {π
i
},π
i
= p(s
1
= i): initial state probabilities
For a given model parameter θ, the joint probability of
the observation sequence and the underlying state sequence
19th International Conference on Information Fusion
Heidelberg, Germany - July 5-8, 2016
978-0-9964527-4-8©2016 ISIF