密度连接性增强的IB算法：局部密度信息的应用

50 浏览量更新于2024-08-30 收藏 644KB PDF 举报

"使用局部密度信息来改进IB算法" 在信息技术领域，信息瓶颈（Information Bottleneck, IB）原理是一种有效的特征选择和数据压缩方法。它强调从复杂的数据集中提取那些能够最好地代表目标变量的最关键信息，同时舍弃不重要的细节。IB理论认为，通过压缩数据并保留关于特定目标变量的信息，可以提取出最相关的特征。然而，现有的IB算法往往只关注元素对之间的信息，而忽略了元素邻域内的信息交互，这导致了它们在保留相对信息方面的能力受限，从而影响了其在各种应用中的表现。针对这个问题，文章提出了密度连通性组件的概念。这一概念考虑到了一个元素的邻居之间的信息损失，而不仅仅是单个元素对之间的信息损失。作者将这个概念融入到现有的集成IB算法（aggregate Information Bottleneck, aIB）和顺序IB算法（sequential Information Bottleneck, sIB）中，开发出基于密度的IB新算法——DaIB（Density-aware aIB）和DsIB（Density-sensitive sIB）。 DaIB和DsIB算法的优势在于它们能够更好地捕捉局部密度信息，从而更有效地保留相关特征。通过实验，在基准数据集上，这两个新算法相对于aIB和sIB展示了更好的性能，能够保留更多相关的信息，并且提高了预测的准确性。这表明，利用局部密度信息对于改进信息瓶颈算法是有效的，可以增强其在数据挖掘、机器学习以及模式识别等领域的应用潜力。关键词：信息瓶颈、密度、邻域信息、层次树结构论文历史：2009年6月29日接收，2010年9月25日在线发表，由L.Heutte通讯总结来说，这篇文章的核心贡献是提出了利用局部密度信息来增强信息瓶颈算法，通过改进后的算法，可以在特征提取过程中更好地保留数据的内在结构和相关性，提高模型的预测性能。这对于处理高维度、复杂数据集的问题具有重要意义，尤其是在需要减少数据冗余和保持关键信息的情景下。

rates (the logarithm of jTj) under a given constraint on the average

distortion D, is given by the following function:

RðDÞ¼ min

fpðtjxÞ:Eðdðx;tÞÞ6Dg

IðX; TÞ; ð1Þ

where the I(X;T) is the mutual information between X and T, and

the E(d(x,t)) is the expected distortion induced by p(tjx).

Unlike the rate distortion approach, the IB method avoids the

arbitrary choice of the distortion function d(x,t)(Tishby et al.,

1999). The motivation comes from the fact that in many cases,

deﬁning the ‘‘target” variable Y with respect to X is a much easier

task than deﬁning a distortion function. Given the joint probability

distribution p(x,y) on variables X and Y, the IB method considers

the following distortion function

dðx; tÞ¼D

ðpðyjxÞkpðyjtÞÞ; ð2Þ

where D

(fkg) is the Kullback–Leibler divergence between distribu-

tions f() and g(). What is interesting is that the pðyjtÞ¼

pðyjxÞpðxjtÞ is the function of p(tjx). Hence, the distortion

function d(x,t) is not predetermined here; instead as it searches

for the best representation p(tjx) it also searches for the most suit-

able distortion function d(x,t).

In the IB method, the expected distortion E(d(x,t)) can be writ-

ten as

Eðdðx; tÞÞ ¼ ED

ðpðyjxÞkpðyjtÞÞ½

x;t

pðx; tÞ

pðyjxÞ log

pðyjxÞ

pðyjtÞ

x;t;y

pðx; t; yÞ log

pðyjxÞ

pðyjtÞ

¼ IðX; YÞIðT; YÞ: ð3Þ

Substituting formula (3) into the rate distortion function (1) we can

get:

RðDÞ¼ min

fpðtjxÞ:IðX;Y ÞIðT;YÞ6Dg

IðX; TÞ: ð4Þ

As I(X;Y) is a constant, the rate distortion function is usually written

RðDÞ¼ min

fpðtjxÞ:IðT;YÞPD

IðX; TÞ: ð5Þ

The equation shows that the IB method tries to minimize I(X; T)

while ensuring I(T;Y) is no less than an inﬁmum D

. In a sense, this

objective function implements a ‘‘bottleneck” for the dependency

between X and Y through T, i.e., one is trying to squeeze the infor-

mation which X provides about Y through a compact ‘‘bottleneck”

formed by the compressed representation T. The objective of an IB

algorithm is then to minimize I(X;T) 

I(T;Y), a compromise be-

tween two mutual information. If multiply it with 

, we can get

the dual objective function of maximizing I(T; Y) 

1

I(T;X)=

I(T;Y)  bI(T; X), here b =

1

, and both

and b are predeﬁned posi-

tive coefﬁcients.

2.2. The IB algorithms

When b ? 0, the IB objective function is to maximize the I(T;Y),

the mutual information between T and Y. In this situation, p(tjx)

will approach zero or one almost everywhere. Without any

assumption about the origin of the joint distribution p(x, y), Tishby

et al. showed that the IB problem has an exact optimal solution

(Tishby et al., 1999). However, how to construct the optimal or

approximated solutions remains an open problem. Several algo-

rithms were developed for the IB problem (see (Slonim, 2002) for

detailed review and comparison). Among them, the sequential IB

(sIB) algorithm and the agglomerative (aIB) algorithm have been

used widely (Slonim, 2002).

2.2.1. The aIB algorithm

The aIB algorithm implements a hierarchical clustering process:

It starts with the trivial partition in which each element x 2 X rep-

resents a singleton cluster or component t 2 T. To minimize the

possible loss of mutual information I(T; Y), the aIB algorithm

merges ‘‘the most possible merging pair” which locally minimizes

the loss of I(T; Y) at each step. Let t

and t

be two elements of T, the

information loss, also called merger cost, due to the merging of t

and t

is then deﬁned as Slonim (2002):

dðt

; t

Þ¼IðT

before

; YÞIðT

after

; YÞ P 0; ð6Þ

where I(T

before

,Y) and I(T

after

,Y) are the mutual information between

the T and Y before and after t

and t

are merged. This have been fur-

ther formulated as Slonim (2002):

dðt

; t

Þ¼ðpðt

Þþpðt

ÞÞ 



dðt

; t

Þ; ð7Þ

where



d  JS

½pðyjt

Þ; pðyjt

ÞbJS

½pðxjt

Þ; pðxjt

Þ; JS

½p; q is the

Jensen–Shannon divergence between distribution p() and q(),

and

P ¼

pðt

Þþpðt

;

pðt

Þþpðt

The aIB algorithm is the only available IB algorithm which can

generate a hierarchical results. However, as for the performance,

it is not as accurate as the sequential IB (sIB) algorithm (Slonim,

2002).

2.2.2. The sIB algorithm

The sIB algorithm implements a partitioning-based clustering

process. It starts from a random partition of X into the representa-

tion T. At each step, one element x 2 X is ‘‘drawn” from its current

cluster t. Then merge x into t

new

= argmin

t2T

d({x},t) and obtain a

new partition t

new

. When t – t

new

, the mutual information I(T;Y)

will increase. Such procedure continues until no more assignment

updates can further improve I(T;Y).

The major weakness with sIB algorithm is that it is a locally

optimal algorithm. Even though it is proved to converge to a local

stable solution, this solution may not be the optimal one.

2.2.3. The Multi-scale Ncuts algorithm

The Normalized-cuts algorithm was proposed by Jianbo Shi in

2000 (Shi and Malik, 2000), and then was been used into a multi-

scale framework (Multi-scale Ncuts) (Cour and Benezit, 2005).

Multi-scale Ncuts also utilize the neighborhood information in

their work. Though the name of the neighborhood is the same as

the one used in our work, there are still some differences:

 Density and Neighborhood: In our work, we focus on the den-

sity. We assume that data points in the same cluster should

have the similar density value, so the data points who are den-

sity enough should be arranged into the same cluster. To ﬁnd

those data points, we use the neighborhood concept to present

where they are density enough. On the other hand, Multi-scale

Ncuts emphasize the neighborhood and show that the data

points (in spectral area) in a neighborhood could be compressed

to a representative.

 Details: In Multi-scale Ncuts, they ﬁrstly took the R-neighbor-

hoods, N

, N

of a pair of pixels i, j. Then, measure the variance

of afﬁnities between each two pixels i

2 N

, j

2 N

. The afﬁnity

variance across a small neighborhood decrease quickly, which

implies that the pixels in a neighborhood could be condensed

to a representative pixel. In contrast, in our work, we ﬁrst found

312 Y. Ye et al. / Pattern Recognition Letters 32 (2011) 310–320

剩余10页未读，继续阅读

weixin_38517095

粉丝: 4
资源: 936

密度连接性增强的IB算法：局部密度信息的应用

目前实用的IB路由算法Opensm中采用的算法.doc

计算机研究 -融入边信息的多变量IB方法的多视角聚类算法研究.pdf

描述下IB流控的算法？

描述下IB基于VL信用的调度算法

IB rping命令使用

描述下IB流控的基础调度表的具体实现算法

模电中IB,IBQ,Ib,iB,ib他们的含义以及区别

发那科机器人m-250ib r-30ib

使用vl_sift函数来提取图像Ia和Ib中的SIFT特征,使用knnsearch函数对两幅图片中的 SIFT 特征点进行匹配

使用os.popen执行ib_send_bw 相关命令出现了失败

最新资源