Iterative sIB algorithm
q
Huaqiang Yuan
a,
⇑
, Yangdong Ye
b
a
Engineering and Technology Institute, Dongguan University of Technology, Dongguan 523808, China
b
Information Engineering School, Zhengzhou University, Zhengzhou 450001, China
article info
Article history:
Received 3 August 2007
Available online 26 November 2010
Communicated by W. Pedrycz
Keywords:
Information Bottleneck
sIB algorithm
Mutation
Mutual information
abstract
Recent years have witnessed a growing interest in the information bottleneck theory. Among the relevant
algorithms in the extant literature, the sequential Information Bottleneck (sIB) algorithm is recognized
for its balance between accuracy and complexity. However, like many other optimization techniques,
it still suffers from the problem of getting easily trapped in local optima. To that end, our study proposed
an iterative sIB algorithm (isIB) based on mutation for the clustering problem. From initial solution vec-
tors of cluster labels generated by a seeding the sIB algorithm, our algorithm randomly selects a subset of
elements and mutates the cluster labels according to the optimal mutation rate. The results are iteratively
optimized further using genetic algorithms. Finally, the experimental results on the benchmark data sets
validate the advantage of our iterative sIB algorithm over the sIB algorithm in terms of both accuracy and
efficiency.
Ó 2010 Elsevier B.V. All rights reserved.
1. Introduction
Tishby et al. proposed the Information Bottleneck (Tishby et al.,
1999) (IB) method in 1999 for data analysis based on the informa-
tion theory. Their method showed that the hidden patterns in data
are what we expect to find during data analysis. When we analyze
a data object, the result should unveil its relationship with other
objects. Along this line, while the IB method compresses a data ob-
ject into a defined bottleneck variable, it tries to maintain the data
object’s relationship with other relevant data objects. In this way,
IB is able to effectively mine the correlation patterns among data
objects. So far, IB has been widely applied in many fields, including
text clustering (Slonim et al., 2002; Slonim and Tishby, 2000,
2001), image clustering (Goldberger et al., 2006; Winston et al.,
2005), galaxy spectra analysis (Slonim et al., 2001), gene expres-
sion analysis (Tishby and Slonim, 2001), neural system coding
analysis (Schneidman et al., 2002), natural language processing
(Gorodetsky, 2002), and voice recognition (Hecht and Tishby,
2005). In 2002, Slonim summarized the IB principles, algorithms,
and applications in his doctoral thesis and proposed the framework
of Information Bottleneck (IB) theory (Slonim, 2002). Recently,
the IB theory was further expanded with many attractive variants,
such as IBSI (Chechik and Tishby, 2002), CIB (Gondek and
Hofmann, 2003), CCIB (Gondek and Hofmann, 2004), MIB
(Friedman et al., 2001; Elidan and Friedman, 2005), GIB (Chechik
et al., 2005), and SDR (Globerson and Tishby, 2003).
IB theory originated from the well-known Rate-Distortion the-
ory. To tackle insufficient objectiveness in the selection of distor-
tion functions in the Rate-Distortion theory, Tishby extracted a
reasonable distortion function by defining the relevant variable
with respect to the source data. With the mathematical framework
and the formal solution, Tishby et al. thereby established the IB the-
ory (Tishby et al., 1999). Later, many variants of the IB algorithm
were proposed, such as iterative IB (iIB) (Tishby et al., 1999),
agglomerative IB (aIB) (Slonim and Tishby, 1999), deterministic IB
(dIB) (Slonim, 2002), and sequential IB (sIB) (Slonim et al., 2002).
Iterative IB is, as its name implies, an iterative algorithm and similar
to the EM algorithm; aIB is a bottom-up agglomerative algorithm;
and dIB is a top-down divisive algorithm. The last two are costly
in time and space (Slonim, 2002; Slonim and Tishby, 1999).
Compared with other IB algorithms, the sIB algorithm yields an
optimal solution for the IB problem with lower time and space
complexity (Slonim et al., 2002; Slonim, 2002). It has found wide
applications, especially in practical clustering problems. However,
this algorithm may only offer a local optimal solution, because it
starts from a random partition and will offer different solutions
from different runs. To address this problem, Slonim ran the sIB
several times and chose the best solution (Slonim et al., 2002).
Unfortunately, several drawbacks still exist with Slonim’s method
including randomness in results and insufficient optimization.
In 2004, Peltonen et al. leveraged Bayes factor to propose an fsIB
algorithm, which yielded better clustering performance on small
and sparse data sets (Peltonen et al., 2004). Still, fsIB could not
completely solve the problems in the sIB algorithm.
0167-8655/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.patrec.2010.11.020
q
Supported by the National Natural Science Foundation of China (Grant Nos.
60573029, 60773050, and 60773048).
⇑
Corresponding author. Fax: +86 76922862911.
E-mail address: yuanhq@dgut.edu.cn (H. Yuan).
Pattern Recognition Letters 32 (2011) 606–614
Contents lists available at ScienceDirect
Pattern Recognition Letters
journal homepage: www.elsevier.com/locate/patrec