随机搜索优化深度学习超参数：理论与实践

需积分: 38 83 浏览量更新于2024-07-18 1 收藏 711KB PDF 举报

随机搜索在深度学习超参数优化中的应用是本篇文章的核心内容。作者James Bergstra和Yoshua Bengio，分别来自蒙特利尔大学的计算机科学与运筹学系，通过对网格搜索（Grid Search）和手动搜索（Manual Search）策略进行深入研究，提出了一种新的超参数优化方法——随机搜索。网格搜索和手动搜索是当前最常用的超参数调整策略，它们通过在预定义的参数空间中逐个尝试所有可能的组合来寻找最佳配置。然而，这篇论文指出，实际上随机选择的试验在超参数优化过程中往往更有效。为了支持这一理论，作者对比了随机搜索与之前一项大规模研究的结果，该研究使用网格搜索和手动调整来配置神经网络和深度信念网络（Deep Belief Networks）。研究发现，相比于使用纯网格搜索的模型配置，随机搜索在相同的时间内能够找到性能同样优秀甚至更好的模型。这表明，随机搜索能够在给定相同的计算预算下，通过探索一个更大且更具潜力的参数空间，显著提升模型性能。它避免了网格搜索的局限性，后者在高维参数空间中可能会陷入局部最优，而随机搜索的随机性使其能够跳出这些陷阱，找到全局最优或接近最优的解决方案。随机搜索的优势体现在以下几个方面： 1. 效率：随机搜索在寻找最优超参数时，由于没有预设的网格结构，可以跳过无效的组合，节省大量的计算时间。 2. 广度优先：通过随机采样，可以覆盖更大的参数区域，增加了找到全局最优解的可能性。 3. 避免局部最优：不像网格搜索那样受限于固定的步长和范围，随机搜索能更好地探索未知领域，从而避免陷入局部最优的陷阱。 4. 灵活性：随机搜索可以根据实际情况调整搜索策略，适应不同模型和任务的需求。这篇文章提供了实证和理论上的证据，证明在深度学习的超参数优化过程中，随机搜索是一种高效且强大的工具，值得在实际应用中进一步推广和探索。随着机器学习和深度学习技术的发展，随机搜索作为一种灵活、高效的超参数调优手段，将在未来的研究和实践中扮演重要角色。

RANDOM SEARCH FOR HYPER-PARAMETER OPTIMIZATION

2. Random vs. Grid for Optimizing Neural Networks

In this section we take a second look at several of the experiments of Larochelle et al. (2007) us-

ing random search, to compare with the grid searches done in that work. We begin with a look

at hyper-parameter optimization in neural networks, and then move on to hyper-parameter opti-

mization in Deep Belief Networks (DBNs). To characterize the efﬁciency of random search, we

present two techniques in preliminary sections: Section 2.1 explains how we estimate the general-

ization performance of the best model from a set of candidates, taking into account our uncertainty

in which model is actually best; Section 2.2 explains the random experiment efﬁciency curve that

we use to characterize the performance of random search experiments. With these preliminaries

out of the way, Section 2.3 describes the data sets from Larochelle et al. (2007) that we use in our

work. Section 2.4 presents our results optimizing neural networks, and Section 5 presents our results

optimizing DBNs.

2.1 Estimating Generalization

Because of ﬁnite data sets, test error is not monotone in validation error, and depending on the set

of particular hyper-parameter values λ evaluated, the test error of the best-validation error conﬁgu-

ration may vary. When reporting performance of learning algorithms, it can be useful to take into

account the uncertainty due to the choice of hyper-parameters values. This section describes our

procedure for estimating test set accuracy, which takes into account any uncertainty in the choice

of which trial is actually the best-performing one. To explain this procedure, we must distinguish

between estimates of performance Ψ

(valid)

= Ψ and Ψ

(test)

based on the validation and test sets

respectively:

(valid)

(λ) = mean

x∈X

(valid)



x;A

(train)

)



(test)

(λ) = mean

x∈X

(test)



x;A

(train)

)



Likewise, we must deﬁne the estimated variance V about these means on the validation and test sets,

for example, for the zero-one loss (Bernoulli variance):

(valid)

(λ) =

(valid)

(λ)



1− Ψ

(valid)

(λ)



(valid)

| − 1

, and

(test)

(λ) =

(test)

(λ)



1− Ψ

(test)

(λ)



(test)

| − 1

With other loss functions the estimator of variance will generally be different.

The standard practice for evaluating a model found by cross-validation is to report Ψ

(test)

(λ

(s)

)

for the λ

(s)

that minimizes Ψ

(valid)

(λ

(s)

). However, when different trials have nearly optimal val-

idation means, then it is not clear which test score to report, and a slightly different choice of λ

could have yielded a different test error. To resolve the difﬁculty of choosing a winner, we report a

weighted average of all the test set scores, in which each one is weighted by the probability that its

particular λ

(s)

is in fact the best. In this view, the uncertainty arising from X

(valid)

being a ﬁnite sam-

ple of G

makes the test-set score of the best model among λ

(1)

,...,λ

(S)

a random variable, z. This

score z is modeled by a Gaussian mixture model whose S components have means µ

= Ψ

(test)

(λ

(s)

285

剩余24页未读，继续阅读

Quincy_chang

粉丝: 1
资源: 7

随机搜索优化深度学习超参数：理论与实践

matlab开发-随机搜索和优化

CNN_Genetic_algorithm:使用GA查找最佳超参数

OpenCV python sklearn随机超参数搜索的实现

函数寻优超参数调节_粒子群算法寻优_源码.rar

在设计神经网络的过程中会有许多的超参数需要调节，其中学习率，批处理的batch大小，

随机森林超参数全解析：理论深度与实战技巧

神经网络中L2正则化的超参数调节策略

超参数调优最新进展：基于强化学习的参数搜索新方法

自动超参数搜索揭秘：贝叶斯优化的实战应用

MATLAB深度学习算法调优：超参数搜索与优化策略

最新资源