"基于数值优化方法的机器学习模型训练"

需积分: 0 36 浏览量更新于2024-03-22 收藏 1.93MB PDF 举报

Continuous optimization is an essential aspect of training machine learning models, as these algorithms are implemented on computers and require mathematical formulations expressed as numerical optimization methods. The goal of training a machine learning model is to find a set of parameters that optimize the objective function or probabilistic model, ultimately leading to a model with high accuracy and performance. There are various numerical methods utilized in continuous optimization to achieve this goal, including gradient descent, stochastic gradient descent, and Adam optimization. These methods involve iteratively updating the parameters of the model based on the gradient of the objective function, moving towards the optimal solution. Gradient descent is a popular optimization algorithm that calculates the gradient of the objective function with respect to the parameters and updates the parameters in the direction that minimizes the function. Stochastic gradient descent is a variation of gradient descent that updates the parameters using only a subset of the training data at each iteration, making it more efficient for large datasets. Adam optimization combines the advantages of both gradient descent and stochastic gradient descent by adapting the learning rate for each parameter based on the first and second moments of the gradients. This adaptive learning rate allows Adam optimization to converge faster and be more robust to variations in the objective function. Overall, continuous optimization plays a crucial role in training machine learning models by finding the optimal set of parameters that minimize the objective function or maximize the probabilistic model. By utilizing numerical optimization methods such as gradient descent, stochastic gradient descent, and Adam optimization, machine learning practitioners can effectively train models with high accuracy and performance.

216 Continuous Optimization

Example 7.1

Consider a quadratic function in two dimensions

✓

◆







1 20















(7.7)

with gradient

✓

◆



1 20











. (7.8)

Starting at the initial location x

=[3, 1]

, we iteratively apply (7.6)

to obtain a sequence of estimates that converge to the minimum value

(illustrated in Figure 7.3). We can see (both from the ﬁgure and by

plugging x

into (7.8)) that the the gradient at x

points north and

east, leading to x

=[1.98, 1.21]

. Repeating that argument gives us

=[1.32, 0.42]

, and so on.

Remark. Gradient descent can be relatively slow close to the minimum:3904

Its asymptotic rate of convergence is inferior to many other methods. Us-3905

ing the ball rolling down the hill analogy, when the surface is a long thin3906

valley the problem is poorly conditioned (Trefethen and Bau III, 1997).3907

For poorly conditioned convex problems, gradient descent increasingly3908

‘zigzags’ as the gradients point nearly orthogonally to the shortest direc-3909

tion to a minimum point, see Fig. 7.3. }3910

7.1.1 Stepsize3911

As mentioned earlier, choosing a good stepsize is important in gradient3912

descent. If the stepsize is too small, gradient descent can be slow. If theThe stepsize is also

called the learning

rate

3913

stepsize is chosen too large, gradient descent can overshoot, fail to con-3914

verge, or even diverge. We will discuss the use of momentum in the next3915

section. It is a method that smoothes out erratic behavior of gradient up-3916

dates and dampens oscillations.3917

Adaptive gradient methods rescale the stepsize at each iteration, de-3918

pending on local properties of the function. There are two simple heuris-3919

tics (Toussaint, 2012):3920

• When the function value increases after a gradient step, the step size3921

was too large. Undo the step and decrease the stepsize.3922

• When the function value decreases the step could have been larger. Try3923

to increase the stepsize.3924

Although the “undo” step seems to be a waste of resources, using this3925

heuristic guarantees monotonic convergence.3926

Draft (2018-09-14) from Mathematics for Machine Learning. Errata and feedback to https://mml-book.com.

剩余22页未读，继续阅读

东方捕

粉丝: 22
资源: 310

"基于数值优化方法的机器学习模型训练"

计算优化1

11 优化1

优化方案1

PSO.rar_算法优化 报告_连续优化 C_连续优化 报告_连续优化c++_连续优化报告

TSA：用于连续优化的树种子算法：解决连续优化问题的新型优化器-matlab开发

aco.rar_ACO 连续空间_蚁群_蚁群算法综述_连续优化 C++_连续蚁群优化

用于连续优化的树种子算法

连续优化问题的细菌觅食算法优化与应用

时延估计问题属于连续优化问题还是离散优化

连续相位板优化设计

最新资源

PSO.rar_算法优化报告_连续优化 C_连续优化报告_连续优化c++_连续优化报告