没有合适的资源?快使用搜索试试~ 我知道了~
首页互信息引导的高效特征选择策略
在"通过互信息进行特征选择"这篇综述性论文中,作者弗朗索瓦·弗勒雷特探讨了在机器学习领域中如何有效地利用互信息作为特征选择的工具。特征工程作为机器学习过程中的关键环节,其目标是提高模型的性能和效率,而特征选择则是其中不可或缺的一环。论文的核心内容集中在一种基于条件互信息的快速特征选择方法上。 条件互信息(Conditional Mutual Information, CMI)是一种衡量两个随机变量之间依赖性的统计量,它考虑了第三个变量的影响。在特征选择中,作者提出的方法旨在最大化特征与目标变量之间的条件互信息,同时考虑到已选特征之间的相互影响。这种方法的优点在于能够确保选取的特征不仅自身具有较高的预测能力,而且与其他特征之间的相关性较弱,从而避免了多重共线性问题。 论文比较了这种新的特征选择方法与传统算法,如基于规则的、过滤式的方法以及提升(Boosting)和支持向量机(SVMs)等更复杂的模型优化技术。结果显示,条件互信息方法在效率上超越了传统算法,并且当将这些经过选择的特征用于构建朴素贝叶斯分类器时,其性能可以接近当前最先进的机器学习方法。 具体实践部分,该方法在500个训练样本的场景下,能在标准的1GHz PC上仅用十分之一秒的时间,从40,000个特征中选出50个关键特征。这表明了该方法在实际应用中的高效性和实用性。 这篇论文为特征工程提供了一个新颖且有效的策略,即利用条件互信息来筛选出既具有独立信息价值又低度冗余的特征,这对于提高机器学习模型的性能、减少计算复杂性和加快训练速度具有重要意义。通过结合信息理论和机器学习算法,作者揭示了一种潜在的高效特征选择途径,为数据挖掘和模型构建提供了新的思考视角。
资源详情
资源推荐
FAST BINARY FEATURE SELECTION
min
k
I(Y ; X |X
ν(k)
) = 0.
Conversely, the higher this value, the more X is relevant. A natural criterion consists of ranking
the remaining features according to that quantity, and to pick the one with the highest value.
2.5 Other Feature Selection Methods
This section lists the various feature selection methods we have used for comparison in our experi-
ments.
2.5.1 RANDOM SAMPLING
The most trivial form of feature selection consist of a uniform random subsampling without rep-
etition. Such an approach leads to features as independent as the original but does not pick the
informative ones. This leads to poor results when only a small fraction of the features actually
provide information about the class to predict.
2.5.2 MUTUAL INFORMATION MAXIMIZATION
To avoid the main weakness of the random sampling described above, we have also implemented a
method which picks the K features ν(1), . . . , ν(K) maximizing individually the mutual information
ˆ
I
Y ; X
ν(l)
with the class to predict. Selection based on such a ranking does not ensure weak
dependency among features, and can lead to redundant and poorly informative families of features.
In the following sections, we call this method MIM for Mutual Information Maximization.
2.5.3 C4.5 BINARY TREES
As proposed by Ratanamahatana and Gunopulos (2003), binary decision trees can be used for fea-
ture selection. The idea is to grow several binary trees and to rank features according to the number
of times they appear in the top nodes. This technique is proposed in the literature as a good filter
for naive Bayesian classifiers, and is a good example of a scheme able to spot statistical dependen-
cies between more than two features, since the choice of a feature in a binary tree depends on the
statistical behavior conditionally on the values of the ones picked above.
Efficiency was increased on our specific task by using randomization (Amit et al., 1997) which
consist of using random subsets of the features instead of random subsets of training examples as in
bagging (Breiman, 1999, 1996).
We have built 50 trees, each with one half of the features selected at random, and collected the
features in the first five layers. Several configurations of number of trees, proportions of features
and proportions of training examples were compared and the best one kept. This method is called
“C4.5 feature selection” in the result sections.
2.5.4 FAST CORRELATION-BASED FILTER
Th FCBF method addresses explicitly the correlation between features. It first ranks the features
according to their mutual information with the class to predict, and remove those which mutual
information is lesser than a threshold δ.
1535
剩余24页未读,继续阅读
慕阮
- 粉丝: 32
- 资源: 8
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- Lombok 快速入门与注解详解
- SpringSecurity实战:声明式安全控制框架解析
- XML基础教程:从数据传输到存储解析
- Matlab实现图像空间平移与镜像变换示例
- Python流程控制与运算符详解
- Python基础:类型转换与循环语句
- 辰科CD-6024-4控制器说明书:LED亮度调节与触发功能解析
- AE particular插件全面解析:英汉对照与关键参数
- Shell脚本实践:创建tar包、字符串累加与简易运算器
- TMS320F28335:浮点处理器与ADC详解
- 互联网基础与结构解析:从ARPANET到多层次ISP
- Redhat系统中构建与Windows共享的Samba服务器实战
- microPython编程指南:从入门到实践
- 数据结构实验:顺序构建并遍历链表
- NVIDIA TX2系统安装与恢复指南
- C语言实现贪吃蛇游戏基础代码
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功