深度学习算法：随机优化单元测试的实践与评估

下载需积分: 0 | PDF格式 | 8.26MB | 更新于2024-08-25 | 103 浏览量 | 举报

"《随机优化的单元测试》(Unit Tests for Stochastic Optimization)是由Tom Schaul、Ioannis Antonoglou和David Silver等人撰写的一篇论文，来自DeepMind Technologies，地址位于英国伦敦的130 Fenchurch Street。该研究聚焦于深度学习领域中的一个重要技术组件——随机梯度下降法（Stochastic Gradient Descent，SGD）在大规模机器学习算法中的应用。尽管已经有许多此类优化算法被设计出来，但算法的稳健性和普适性在面对不同优化场景时仍然缺乏明确评估。论文作者指出，由于实际应用中通常涉及多个复杂问题交织，验证算法性能往往困难重重。因此，他们提出了一种针对随机优化的单元测试方法。这些单元测试在一个小规模、孤立且被充分理解的难题上迅速评估算法，而非在真实世界场景中，这样可以更准确地衡量算法的性能和稳定性。通过这些单元测试是任何声称具有普遍适用性和鲁棒性的算法的基础，但并非全部，它们是必不可少的。论文作者提供了对一系列已建立算法的定量和定性结果，旨在提供一个开放源代码的、可扩展且易于应用于新算法的测试框架。这对于保证深度学习中优化算法的质量和可靠性具有重要意义，有助于推动整个领域的健康发展。通过严格的单元测试，研究人员可以更好地了解算法在理想环境下的行为，从而改进现有算法或开发出更强大的解决方案。"

展开

2 Unit test Construction

Our testing framework is an open-source library containing a collection of unit tests and visualization

tools. Each unit test is deﬁned by a prototype function to be optimized, a prototypical scale, a noise

prototype, and optionally a non-stationarity prototype. A prototype function is the concatenation

of one or more local shape prototypes. A multi-dimensional unit test is a composition of one-

dimensional unit tests, optionally with a rotation prototype or curl prototype.

2.1 Shape Prototypes

Shape prototypes are functions deﬁned on an interval, and our collection includes linear slopes

(zero curvature), quadratic curves (ﬁxed curvature), convex or concave curves (varying curvature),

and curves with exponentially increasing or decreasing slope. Further, there are a number of non-

differentiable local shape prototypes (absolute value, rectiﬁed-linear, cliff). All of these occur in

realistic learning scenarios, for example in logistic regression the loss surface is part concave and

part convex, an MSE loss is the prototypical quadratic bowl, but then regularization such as L1

introduces non-differentiable bends (as do rectiﬁed-linear or maxout units in deep learning [15, 16]).

Steep cliffs in the loss surface are a common occurrence when training recurrent neural networks,

as discussed in [11]. See the top rows of Figure 1 for some examples of shape prototypes.

2.2 One-dimensional Concatenation

In our framework, we can chain together a number of shape prototypes, in such a way that the result-

ing function is continuous and differentiable at all junction points. We can thus produce many pro-

totype functions that closely mimic existing functions, e.g., the Laplace function, sinusoids, saddle-

points, step-functions, etc. See the bottom rows of Figure 1 for some examples.

A single scale parameter determines the scaling of a concatenated function across all its shapes using

the junction constraints. Varying the scales is an important aspect of testing robustness because it is

not possible to guarantee well-scaled gradients without substantial overhead. In many learning prob-

lems, effort is put into proper normalization [17], but that is insufﬁcient to guarantee homogeneous

scaling, for example throughout all the layers of a deep neural network.

2.3 Noise Prototypes

The distinguishing feature of stochastic gradient optimization (compared to batch methods) is that it

relies on sample gradients (coming from a subset of even a single element of the dataset) which are

inherently noisy. In out unit tests, we model this by four types of stochasticity:

• Scale-independent additive Gaussian noise on the gradients, which is equivalent to random

translations of inputs in a linear model with MSE loss. Note that this type of noise ﬂips the

sign of the gradient near the optimum and makes it difﬁcult to approach precisely.

• Multiplicative (scale-dependent) Gaussian noise on the gradients, which multiplies the gra-

dients by a positive random number (signs are preserved). This corresponds to a learning

scenario where the loss curvature is different for different samples near the current point.

• Additive zero-median Cauchy noise, mimicking the presence of outliers in the dataset.

• Mask-out noise, which zeros the gradient (independently for each dimension) with a certain

probability. This mimics both training with drop-out [18], and scenarios with rectiﬁed

linear units where a unit will be inactive for some input samples, but not for others.

For the ﬁrst three, we can vary the noise scale, while for mask-out we pick a drop-out frequency.

This noise is not necessarily unbiased (as in the Cauchy case), breaking common assumptions made

in algorithm design (but the modiﬁcations in section 2.5 are even worse). See Figure 2 for an illus-

tration of the ﬁrst two noise prototypes. Noise prototypes and prototype functions can be combined

independently into one-dimensional unit tests.

下载后可阅读完整内容，剩余12页未读，立即下载

身份认证购VIP最低享 7 折!

30元优惠券

zeeq_

粉丝: 1w+

深度学习算法：随机优化单元测试的实践与评估

smoo-alg:求解随机多目标优化问题的算法

Local Shannon entropy measure with statistical tests for image randomness.pdf

unit tests written in Java.zip

SECOND ROUND IQ TESTS(1).pdf

CompTIA Linux Practice Tests, 2nd Edition.pdf

艺术+历史类讲座笔记 Tests 31-37.pdf

Signal Integrity Compliance and Diagnostic Tests for PCIe.pdf

unit-tests-nigthwatch.js

activemq-unit-tests-5.9.1.jar

activemq-unit-tests-5.10.0.jar

最新资源