A Hierarchical Dirichlet Process Mixture of GID Distributions with Feature
Selection for Spatio-Temporal Video Modeling and Segmentation
Wentao Fan
∗
, Nizar Bouguila
†
and Xin Liu
‡
∗
Department of Computer Science and Technology, Huaqiao University, Xiamen, China
Email: fwt@hqu.edu.cn
†
Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Email: nizar.bouguila@concordia.ca
‡
Department of Computer Science and Technology, Huaqiao University, Xiamen, China
Email: xliu@hqu.edu.cn
Abstract—In this paper, a hierarchical Dirichlet process
(HDP) mixture model of generalized inverted Dirichlet (GID)
distributions with an unsupervised feature selection scheme
is developed. The proposed model is learned via a principled
variational framework and then deployed for video modeling
and segmentation. Experimental results show the merits of our
developed statistical framework.
Keywords-Mixture models, Dirichlet process, feature selec-
tion, variational learning, video segmentation.
I. INTRODUCTION
Semantic video segmentation is an important step in many
applications and necessitates the development of strong
machine learning techniques [1], [2], [3]. The main goal is to
automatically partition video sequences into spatiotemporal
segments. Several approaches have been proposed in the past
[4]. In this paper, we approach this problem by developing
a framework based on HDP mixture model, which is a hier-
archical nonparametric Bayesian framework that has shown
promising performance in clustering of grouped data with
sharing clusters [5], [6]. This model is particularly useful in
many real-world problems where one cluster may be highly
overlapped or even could be embedded into another cluster.
A HDP mixture model is described as follows: Suppose that
we have collected N observations that are organized into M
groups, for each observation X
ji
that is drawn independently
from a mixture model and, we associate a factor θ
ji
, where
the index ji indicates the observation i within group j.
In order to form a Bayesian approach, each factor θ
ji
is
distributed according to a prior G
j
. Then, we have
θ
ji
|G
j
∼ G
j
, X
ji
|θ
ji
∼ F (θ
ji
) (1)
where F (θ
ji
) denotes the probability distribution of X
ji
given θ
ji
. The prior G
j
is distributed according to the HDP
model, which is built on the Dirichlet process (DP) [7] that
contains a Bayesian hierarchy where the base measure of a
DP is itself a drawn from a DP:
G
0
∼ DP(γ, H)
G
j
∼ DP(λ, G
0
), for each j ∈ {1, . . . , M} (2)
where each group is associated with a group-level DP G
j
,
and this indexed set of DPs {G
j
} shares a common base
(i.e. a global-level) distribution G
0
. A crucial problem when
using such models is the choice of a parent distribution
and the selection of the relevant modeling features. In
this paper we propose a principled approach to tackle
the video segmentation problem by considering the GID
that has been shown to provide a principled approach for
simultaneous clustering and feature selection. The resulting
model is learned within a variational framework that we have
developed. The rest of this paper is organized as follows.
Section II presents our model. Section III is devoted to the
experimental results. The conclusion is given in Section III.
II. HDP MIXTURE OF GID DISTRIBUTIONS
In the global-level, the global measure G
0
follows the
Dirichlet process DP(γ, H) and can be described using the
stick-breaking representation [8], [9] as
ξ
0
k
∼ Beta(1, γ), Λ
k
∼ H
ξ
k
= ξ
0
k
k−1
Y
s=1
(1 − ξ
0
s
), G
0
=
∞
X
k=1
ξ
k
δ
Λ
k
(3)
where {Λ
k
} is a set of independent random variables drawn
from H, δ
Λ
k
is an atom centered at Λ
k
. The stick-breaking
weights ξ
k
satisfy the constraint that
P
∞
k=1
ξ
k
= 1. Since
G
0
is the base measure of G
j
, the atoms Λ
k
are therefore
shared among all G
j
and are only differ in weights. Next,
we construct each group-level DP G
j
:
π
0
jt
∼ Beta(1, λ), Ω
jt
∼ G
0
π
jt
= π
0
jt
t−1
Y
s=1
(1 − π
0
js
), G
j
=
∞
X
t=1
π
jt
δ
Ω
jt
(4)
where δ
Ω
jt
are group-level atoms centered at Ω
jt
, {π
jt
} is a
set of stick-breaking weights which satisfies
P
∞
t=1
π
jt
= 1.
Since Ω
jt
is distributed according to the base distribution
G
0
, it takes on the value Λ
k
with probability ξ
k
. Thus, it is
straightforward to introduce a binary latent variable W
jtk
as
2771978-1-5090-4117-6/17/$31.00 ©2017 IEEE ICASSP 2017