AUTOMLMETHODS,SYSTEMS,CHALLENGES

需积分: 9 104 浏览量更新于2023-03-16 评论 1 收藏 9.87MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

AUTOML: METHODS, SYSTEMS, CHALLENGES

Editors: Frank Hutter, Lars Kotthoff, Joaquin Vanschoren

We’re in the process of finishing this edited book, and it will be ready for sale by NIPS 2018. Next

to publishing, we will keep the book open access. Below, we share preliminary versions of the

chapters; at this point in time, these are all drafts, before copy editing.

Catalog

Part 1: AutoML Methods

This part comprises highly up-to-date overview chapters on the common foundations behind all

AutoML systems.

Chapter 1: Hyperparameter Optimization. By Matthias Feurer and Frank Hutter

Chapter 2: Meta Learning. By Joaquin Vanschoren

Chapter 3: Neural Architecture Search. By Thomas Elsken, Jan-Hendrik Metzen and Frank Hutter

Part 2: AutoML Systems

This part comprises in-depth descriptions of a broad range of available AutoML systems that can

be used for effective machine learning out of the box.

Chapter 4: Auto-WEKA. By Lars Kotthoff and Chris Thornton and Holger H. Hoos and Frank Hutter

and Kevin Leyton-Brown

Chapter 5: Hyperopt-Sklearn. By Brent Komer and James Bergstra and Chris Eliasmith

Chapter 6: Auto-sklearn: Efficient and Robust Automated Machine Learning. By Matthias Feurer

and Aaron Klein and Katharina Eggensperger and Jost Tobias Springenberg and Manuel Blum and

Frank Hutter

Chapter 7: Auto-Net: Towards Automatically-Tuned Neural Networks. By Hector Mendoza and

Aaron Klein and Matthias Feurer and Jost Tobias Springenberg and Matthias Urban and Michael

Burkart and Max Dippel and Marius Lindauer and Frank Hutter

Chapter 8: TPOT: A Tool for Automating Machine Learning. By Randal S. Olson and Jason H. Moore

Chapter 9: The Automatic Statistician. By Christian Steinruecken and Emma Smith and David Janz

and James Lloyd and Zoubin Ghahramani

Part 3: AutoML Challenges

This part provides an in-depth analysis of all AutoML challenges held to date.

Chapter 10: Analysis of the AutoML Challenge series 2015-2018. By Isabelle Guyon and Lisheng

Sun-Hosoya and Marc Boull e and Hugo Jair Escalante and Sergio Escalera and Zhengying Liu and

Damir Jajetic and Bisakha Ray and Mehreen Saeed and Michele Sebag and Alexander Statnikov and

Wei-Wei Tu and Evelyne Viegas

4 CHAPTER 1. HYPERPARAMETER OPTIMIZATION

• improve the performance of machine learning algorithms (by tailoring

them to the problem at hand); this has led to new state-of-the-art per-

formances for important machine learning benchmarks in several studies

(e.g. [137, 102]).

• improve the reproducibility and fairness of scientiﬁc studies. Automated

HPO is clearly more reproducible than manual search. It facilitates fair

comparisons since diﬀerent methods can only be compared fairly if they

all receive the same level of tuning for the problem at hand [12, 130].

The problem of HPO has a long history, dating back to the 1990s (e.g., [123,

104, 74, 79]), and it was also established early that diﬀerent hyperparameter

conﬁgurations tend to work best for diﬀerent datasets [79]. In contrast, it is a

rather new insight that HPO can be used to adapt general-purpose pipelines to

speciﬁc application domains [28]. Nowadays, it is also widely acknowledged that

tuned hyperparameters improve over the default setting provided by common

machine learning libraries [146, 97, 127, 113].

Because of the increased usage of machine learning in companies, HPO is

also of substantial commercial interest and plays an ever larger role there, be it

in company-internal tools [42], as part of machine learning cloud services [86, 5],

or as a service by itself [134].

HPO faces several challenges which make it a hard problem in practice:

• Function evaluations can be extremely expensive for large models (e.g., in

deep learning), complex machine learning pipelines, or large datesets.

• The conﬁguration space is often complex (comprising a mix of continuous,

categorical and conditional hyperparameters) and high-dimensional. Fur-

thermore, it is not always clear which of an algorithm’s hyperparameters

need to be optimized, and in which ranges.

• We usually don’t have access to a gradient of the loss function with re-

spect to the hyperparameters. Furthermore, other properties of the target

function often used in classical optimization do not typically apply, such

as convexity and smoothness.

• One cannot directly optimize for generalization performance as training

datasets are of limited size.

We refer the interested reader to other reviews of HPO for further discussions

on this topic [61, 91].

This chapter is structured as follows. First, we deﬁne the HPO problem

formally and discuss its variants (Section 1.2). Then, we discuss blackbox opti-

mization algorithms for solving HPO (Section 1.3). Next, we focus on modern

multi-ﬁdelity methods that enable the use of HPO even for very expensive mod-

els, by exploiting approximate performance measures that are cheaper than full

model evaluations (Section 1.4). We then provide an overview of the most

important hyperparameter optimization systems and applications to AutoML

(Section 1.5) and end the chapter with a discussion of open problems (Section

1.6).

1.2. PROBLEM STATEMENT 5

1.2 Problem Statement

Let A denote a machine learning algorithm with N hyperparameters. We denote

the domain of the n-th hyperparameter by Λ

and the overall hyperparameter

conﬁguration space as Λ = Λ

× Λ

× . . . Λ

. A vector of hyperparameters is

denoted by λ ∈ Λ, and A with its hyperparameters instantiated to λ is denoted

by A

The domain of a hyperparameter can be real-valued (e.g., learning rate),

integer-valued (e.g., number of layers), binary (e.g., whether to use early stop-

ping or not), or categorical (e.g., choice of optimizer). For integer and real-

valued hyperparameters, the domains are mostly bounded for practical reasons,

with only a few exceptions [10, 133, 110].

Furthermore, the conﬁguration space can contain conditionality, i.e., a hy-

perparameter may only be relevant if another hyperparameter (or some combi-

nation of hyperparameters) takes on a certain value. Conditional spaces take

the form of directed acyclic graphs. Such conditional spaces occur, e.g., in the

automated tuning of machine learning pipelines, where the choice between dif-

ferent preprocessing and machine learning algorithms is modeled as a categorical

hyperparameter, a problem known as Full Model Selection (FMS) or Combined

Algorithm Selection and Hyperparameter (CASH) [28, 146, 80, 32]. They also

occur when optimizing the architecture of a neural network: e.g., the number

of layers can be an integer hyperparameter and the per-layer hyperparameters

of layer i are only active if the network depth is at least i [10, 12, 31].

Given a data set D, our goal is to ﬁnd

∗

= argmin

λ∈Λ

train

valid

)∼D

V(L, A

, D

train

, D

valid

), (1.1)

where V(L, A

, D

train

, D

valid

) measures the loss of a model generated by al-

gorithm A with hyperparameters λ on training data D

train

and evaluated on

validation data D

valid

. In practice, we only have access to ﬁnite data D ∼ D

and thus need to approximate the expectation in Equation 1.1.

Popular choices for the validation protocol V(·, ·, ·, ·) are the holdout and

cross-validation error for a user-given loss function (such as misclassiﬁcation

rate); see Bischl et al. [14] for an overview of validation protocols. Several

strategies for reducing the evaluation time have been proposed: It is possible

to only test machine learning algorithms on a subset of folds [146], only on

a subset of data [99, 144, 75], or for a small amount of iterations; we will

discuss some of these strategies in more detail in Section 1.4. Recent work on

multi-task [144] and multi-source [118] optimization introduced further cheap,

auxiliary tasks, which can be queried instead of Equation 1.1. These can provide

cheap information to help HPO, but do not necessarily train a machine learning

model on the dataset of interest and therefore do not yield a usable model as a

side product.

6 CHAPTER 1. HYPERPARAMETER OPTIMIZATION

1.2.1 Alternatives to Optimization: Ensembling and Marginal-

ization

Solving Equation 1.1 with one of the techniques described in the rest of this

chapter usually requires ﬁtting the machine learning algorithm A with multiple

hyperparameter vectors λ

. Instead of using the argmin-operator over these,

it is possible to either construct an ensemble (which aims to minimize the loss

for a given validation protocol) or to integrate out all the hyperparameters (if

the model under consideration is a probabilistic model). We refer to Guyon et

al. [47] and the references therein for a comparison of frequentist and Bayesian

model selection.

Only choosing a single hyperparameter conﬁguration can be wasteful when

many good conﬁgurations have been identiﬁed by HPO, and combining them in

an ensemble can improve performance [106]. This is particularly useful in Au-

toML systems with a large conﬁguration space (e.g., in FMS or CASH ), where

good conﬁgurations can be very diverse, which increases the potential gains

from ensembling [29, 17, 32, 4]. To further improve performance, Automatic

Frankensteining [152] uses HPO to train a stacking model [153] on the outputs

of the models found with HPO; the 2

level models are then combined using a

traditional ensembling strategy.

The methods discussed so far applied ensembling after the HPO procedure.

While they improve performance in practice, the base models are not optimized

for ensembling. It is, however, also possible to directly optimize for models

which would maximally improve an existing ensemble [94].

Finally, when dealing with Bayesian models it is often possible to integrate

out the hyperparameters of the machine learning algorithm, for example using

evidence maximization [95], Bayesian model averaging [53], slice sampling [108]

or empirical Bayes [100].

1.2.2 Optimizing for Multiple Objectives

In practical applications it is often necessary to trade oﬀ two or more objectives,

such as the performance of a model and resource consumption [62] (see also

Chapter 3) or multiple loss functions [54]. Potential solutions can be obtained

in two ways.

First, if a limit on a secondary performance measure is known (such as

the maximal memory consumption), the problem can be formulated as a con-

strained optimization problem. We will discuss constraint handling in Bayesian

optimization in Section 1.3.2.

Second, and more generally, one can apply multi-objective optimization to

search for the Pareto front, a set of conﬁgurations which are optimal tradeoﬀs

between the objectives in the sense that, for each conﬁguration on the Pareto

front, there is no other conﬁguration which performs better for at least one and

at least as well for all other objectives. The user can then choose a conﬁguration

from the Pareto front. We refer the interested reader to further literature on

this topic [62, 131, 50, 54].

剩余221页未读，继续阅读

poolpoolpool

粉丝: 5
资源: 64

会员权益专享

AUTOML METHODS, SYSTEMS, CHALLENGES

评论0

会员权益专享

最新资源

AUTOML METHODS, SYSTEMS, CHALLENGES

评论0

AUTOML: METHODS, SYSTEMS, CHALLENGES

2019-Automated Machine Learning Methods, Systems, Challenges, 正式版，Frank

Automated Machine Learning

please unblock challenges.cloudflare.com to proceed.

图神经网络和推荐系统

Please unblock challenges.cloudflare.com to proceed.

maven网页提示Please unblock challenges.cloudflare.com to proceed.如何解决

介绍交通运输专业的英语演讲

在线教学系统的外文文献

Network Slicing in 5G: Survey and Challenges解决问题

有关直觉模糊集的文献综述

sap 英文面试内容

Transformer-Based Visual Segmentation: A Survey

Don't engage in a personal cover-up that is unpleasant in your life. In other words, face de 2) T reality and be mature in your responses to life's challenges.

[2]KAIWARTYA O, ABDULLAH A H, CAO Y, et al. Internet of vehicles: motivation, layered architecture, network model,challenges, and future aspects[J]. IEEE Access,2016,4(2):5356-5373.规范这篇文献格式

100词对人工智能或chatgpt的看法，英文

电离层预报外文文献综述

class ImageNetTrain(ImageNetBase): NAME = "ILSVRC2012_train" URL = "http://www.image-net.org/challenges/LSVRC/2012/" AT_HASH = "a306397ccf9c2ead27155983c254227c0fd938e2" FILES = [ "ILSVRC2012_img_train.tar", ] SIZES = [ 147897477120, ]各参数解析

会员权益专享

最新资源