因子图模型在无监督特征选择中的应用

需积分: 5 34 浏览量更新于2024-07-14 收藏 1.75MB PDF 举报

"用于无监督特征选择的因子图模型" 这篇研究论文提出了一种名为“无监督特征选择的因子图模型”（Factor Graph Model for Unsupervised Feature Selection, 简称FGUFS）的新方法。该模型专注于在没有标签数据的情况下进行特征选择，这是机器学习领域中的一个重要挑战。特征选择是降低数据维度、提高模型效率和解释性的一个关键步骤，尤其在大数据集和高维特征空间中更为重要。因子图模型是一种概率图形模型，它结合了贝叶斯网络和马尔科夫随机场的概念，能够直观地表示变量之间的依赖关系。在FGUFS中，每个特征被视为图中的一个节点，而特征之间的相似性则通过边来表示。这种模型允许模型以无监督的方式捕获特征间的潜在结构和相关性。文章中提到，FGUFS通过消息传递算法计算每个特征的重要性分数。消息传递算法是一种在因子图中传播信息的方法，它能有效地处理局部信息并逐渐构建全局的理解。在这个过程中，特征之间的相似性信息被作为消息在图中传递，每个特征节点根据接收到的消息更新其重要性估计。无监督学习是机器学习的一种形式，其中模型从未经标记的数据中学习模式。在这种情况下，FGUFS通过分析特征之间的关系来确定哪些特征对于数据的潜在结构最有代表性，从而选择出对无监督任务最有价值的特征。这种方法可以帮助识别那些即使在缺乏明确目标标签时也能提供有用信息的特征。此外，论文可能还探讨了FGUFS与其他无监督特征选择方法的比较，可能包括基于统计测试、聚类或矩阵分解的技术。可能还包括了实验部分，展示了FGUFS在各种数据集上的性能，并与其他方法进行了对比，以证明其优势和适用性。关键词：特征选择、因子图、消息传递算法、无监督学习，这些标签进一步明确了论文的研究焦点，即在无监督场景下，如何利用因子图模型和消息传递策略来优化特征选择过程。通过对特征间相似性的度量和传播，FGUFS旨在提供一种有效且鲁棒的特征选择工具，对于数据挖掘和机器学习的实践者来说具有重要价值。

H. Wang, Y. Zhang and J. Zhang et al. / Information Sciences 480 (2019) 144–159 147

Table 1

Main notation.

Symbol Explanation

f Random variable of a feature

D A set of ordered pairs of features

G A grid constructed from all the feature pairs

w ( f

) The hidden variable in a factor graph

h The serial number of the selected feature

I A function that computes mutual information

MIC Maximal information coeﬃcient [30]

S A function that computes similarity

M The number of features

E The energy function

be a set of ordered pairs of features. Furthermore, let the i - and j -values of F

be partitioned one-by-one into i and j bins,

respectively, and let a pair of partitions deﬁne an i -by- j grid G . Given such a grid G , let F

be the distribution induced by

the points in F

on the cells of G . That is, the distribution on the cells of G obtained by letting the probability mass in each

cell be the fraction of points in D falling in that cell. For a ﬁxed F

, different grids G result in different distributions F

For a ﬁnite set F

⊂ R

and positive integers i, j ,

∗

, i, j) = max I(F

) ,

where max is the maximum over all grids G with i columns and j rows, and I ( F

) denotes the mutual information of F

Then, the characteristic matrix and MIC of F

can be deﬁned in terms of I

∗

. The characteristic matrix M ( F

) of a set F

two-variable data is an inﬁnite matrix with entries

M(F

)

(i, j)

∗

, i, j)

log min { i, j}

The MIC of a set F

of two-variable data with sample size M and a grid size less than B ( M ) is given by

MIC (F

) = max

ij<B (M)

M(F

)

(i, j)

where ω(1) < B (M) ≤ o(n

1 −ε

) for some 0 < ε < 1. The MIC falls between 0 and 1 and is symmetric, and higher values imply

greater relevance between features.

Brown et al. [2] presented a unifying framework for feature selection based on mutual information, which formulates the

feature selection task as a conditional likelihood problem. In the proposed algorithm, The MIC is the maximum value of the

mutual information matrix and is used to measure the similarity between features. However, all the methods mentioned

in [2] use mutual information to measure the relevance between features. The MIC ﬁnds the f

-by- f

grid with the highest

induced mutual information, and the mutual information scores are normalized. Then, the normalized scores form a matrix,

and the MIC is the highest score of the matrix. The MIC is calculated according to mutual information. However, it is the

highest normalized mutual information, and strengthens the relationship between the two features. It is more capable of

reﬂecting the dependence between two attributes and is used to evaluate the similarity of features, which can better reduce

the redundancy among features. However, owing to the higher complexity of the MIC, it requires more time than mutual

information to evaluate the similarities among features.

For unsupervised feature selection, the best selected subset should contain the lowest number of features that retain as

much of the original information as possible. Given a high dimensional dataset X = (x

, . . . , x

) in the instance space and

F = ( f

, . . . , f

) in the feature space, where x

∈ R and f

∈ R , let K be the number of selected features, and let

= {

, . . . ,

}

denote the selected feature subset.

In terms of maintaining the original information (e.g., maximum mutual information), the feature selection objective

function can be expressed as follows:

(

) = arg max

(



j=1



i =1

MIC (

, f

)) (1)

s.t

 = f

where K and M are the numbers of selected and total features, respectively, and

is the subset of selected features.

Eq. (1) maximizes the MIC between the selected feature subset and whole feature set, which means that the selected feature

subset can preserve the maximum information of all feature sets. In other words, to a certain extent maximizing the MIC

can remove redundant features.

If an exhaustive search for the objective function in Eq. (1) with M features with N dimensions is employed and K

features are selected, then the computation complexity is O (NM! /K!(M − K)!) . Because this is fairly complex, in the next

section an effective selection approach based on the factor model is presented.

剩余15页未读，继续阅读

weixin_38679277

粉丝: 6
资源: 910

因子图模型在无监督特征选择中的应用

基于因子图模型的动态图半监督聚类算法.docx

资产定价动态因子模型的深度学习.doc

SPEAR:用于多组学分析的稀疏监督贝叶斯因子模型

基于远程监督的多因子人物关系抽取模型

K-Similarity降噪的LSTM神经网络水质多因子预测模型.pdf

基于概率距离流形学习和图模型的半监督识别方法

基于因子聚类分析综合评价模型的构建及实证分析

使用因子分析的基于高斯生成模型的主题检测。 冶金矿产

GA-SVM在住院费用模型中的特征选择与参数优化

煤炭自燃预测：因子分析增强的支持向量机模型

最新资源

使用因子分析的基于高斯生成模型的主题检测。冶金矿产