"自动化工程设计：机器学习中的最佳算法与特征预处理选择"

机器学习

160 浏览量更新于2024-01-27 收藏 1.89MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

本章介绍了机器学习中的自动化工程设计，着重介绍了Auto-sklearn这一高效且稳健的自动化机器学习工具。随着机器学习在各个领域的成功应用，对于非专家用户来说，对于可供使用的机器学习系统的需求也越来越大。为了在实践中发挥作用，这些系统需要能够自动地选择一个新数据集的良好算法和特征预处理步骤。 Auto-sklearn是一个旨在满足这种需求的自动化机器学习工具。它基于scikit-learn库，可以自动选择算法和特征预处理步骤，并且可以通过简单的API进行使用。Auto-sklearn使用了一种基于元学习的方法，通过对元学习器进行训练来选择最佳的机器学习模型和数据预处理步骤。因此，即使用户对机器学习不是很熟悉，也可以通过Auto-sklearn来构建高效的机器学习模型。在本章中，作者提到了Auto-sklearn的主要特点，包括其高效性和鲁棒性。Auto-sklearn使用了一种自适应学习率的算法来同时选择模型和数据预处理步骤，从而可以在较短的时间内找到最佳的模型。此外，Auto-sklearn还具有鲁棒性，即使在面对不同类型的数据集时也能够取得良好的表现。作者还介绍了Auto-sklearn的一些应用案例，包括在Kaggle竞赛中取得优异成绩以及在真实世界的工业应用中取得成功。这些案例表明了Auto-sklearn在各种领域中都能够发挥作用，并且能够为用户节省大量的时间和精力。最后，本章提出了一些未来的研究方向。作者强调了Auto-sklearn在处理大规模数据集和高维数据上的潜力，同时还提到了对元学习算法的改进和扩展等方面。这些研究方向将进一步提升Auto-sklearn的性能和适用范围，使其能够更好地满足实际应用的需求。综合来看，本章介绍了机器学习中的自动化工程设计，重点介绍了Auto-sklearn作为一种高效和稳健的自动化机器学习工具。通过对Auto-sklearn的特点、应用案例和未来研究方向的介绍，读者可以了解到Auto-sklearn在机器学习领域的重要性和潜力，以及其对实际应用的价值和意义。

资源详情

资源推荐

6 Auto-sklearn: Efﬁcient and Robust Automated Machine Learning 117

and a set of meta-features, i.e., characteristics of the dataset that can be computed

efﬁciently and that help to determine which algorithm to use on a new dataset.

This meta-learning approach is complementary to Bayesian optimization for

optimizing an ML framework. Meta-learning can quickly suggest some instan-

tiations of the ML framework that are likely to perform quite well, but it is

unable to provide ﬁne-grained information on performance. In contrast, Bayesian

optimization is slow to start for hyperparameter spaces as large as those of

entire ML frameworks, but can ﬁne-tune performance over time. We exploit this

complementarity by selecting k conﬁgurations based on meta-learning and use their

result to seed Bayesian optimization. This approach of warmstarting optimization

by meta-learning has already been successfully applied before [21, 22, 38], but

never to an optimization problem as complex as that of searching the space of

instantiations of a full-ﬂedged ML framework. Likewise, learning across datasets

has also been applied in collaborative Bayesian optimization methods [4, 45]; while

these approaches are promising, they are so far limited to very few meta-features and

cannot yet cope with the high-dimensional partially discrete conﬁguration spaces

faced in AutoML.

More precisely, our meta-learning approach works as follows. In an ofﬂine phase,

for each machine learning dataset in a dataset repository (in our case 140 datasets

from the OpenML [43] repository), we evaluated a set of meta-features (described

below) and used Bayesian optimization to determine and store an instantiation of

the given ML framework with strong empirical performance for that dataset. (In

detail, we ran SMAC [27] for 24 h with 10-fold cross-validation on two thirds of

the data and stored the resulting ML framework instantiation which exhibited best

performance on the remaining third). Then, given a new dataset D, we compute its

meta-features, rank all datasets by their L

distance to D in meta-feature space and

select the stored ML framework instantiations for the k = 25 nearest datasets for

evaluation before starting Bayesian optimization with their results.

To characterize datasets, we implemented a total of 38 meta-features from the

literature, including simple, information-theoretic and statistical meta-features [29,

33], such as statistics about the number of data points, features, and classes, as

well as data skewness, and the entropy of the targets. All meta-features are listed in

Table 1 of the original publication’s supplementary material [20]. Notably, we had

to exclude the prominent and effective category of landmarking meta-features [37]

(which measure the performance of simple base learners), because they were

computationally too expensive to be helpful in the online evaluation phase. We note

that this meta-learning approach draws its power from the availability of a repository

of datasets; due to recent initiatives, such as OpenML [43], we expect the number

of available datasets to grow ever larger over time, increasing the importance of

meta-learning.

剩余21页未读，继续阅读

P("Struggler")?

粉丝: 1122
资源: 4

会员权益专享

"自动化工程设计：机器学习中的最佳算法与特征预处理选择"

基于机器学习的问答推荐算法设计-论文初稿0.91

离散行业自动化(逻辑算法)工程设计文档-模板.docx

人工智能-机器学习-飞机驾驶舱人机工程设计研究.pdf

可以说一下机器学习工程师吗？

请详细说明机器学习在土木工程结构优化中的应用

简述机器学习和深度学习的区别。

机器学习可以找到什么工作

成为机器学习工程师。在公司内部发展都要经历那些职位

工业组态软件 机器学习ide

机械设计制造及其自动化主要需要学习什么

简述机器学习中的3类典型学习方法及其差别？深度学习相对于传统机器学习方法有何优势

开发自动驾驶需要掌握计算机图形学、机器学习、软件工程、控制理论、模型预测以及机器人技术知识。具体介绍一下

深度学习和机器学习的差别

为什么说深度学习是机器学习的核心

群论在自动化专业的应用

深度学习与机器学习相比的优势有哪些

模式识别和机器学习的发展趋势

自“人工智能教父”Hinton等人[11]提出了深度学习理论,深度学习的浪潮随之而来。相对于传统的机器学习算法

自“人工智能教父”Hinton等人[5]提出了深度学习理论后, 深度学习技术得到了广泛应用和发展。相对于传统的机器学习算法

目前深度学习与机械工程结合的工作岗位有哪些

会员权益专享

最新资源

工业组态软件机器学习ide