异步 proximal 随机梯度算法：解决组合优化问题的新途径

168 浏览量更新于2024-08-27 收藏 404KB PDF 举报

"异步近似梯度算法在组合优化问题中的应用" 这篇研究论文探讨了异步近似梯度算法（Asynchronous Proximal Stochastic Gradient Algorithm）在处理组合优化问题上的应用。组合优化问题在机器学习研究中扮演着重要角色，尤其是在许多新兴应用中，如图像识别、自然语言处理和推荐系统等。这些问题往往可以被转化为包含非光滑正则化惩罚项的优化问题。传统的随机梯度下降法（SGD）及其变种是解决此类问题的常见方法，但它们存在一些固有问题：要么收敛速度慢，要么计算成本高。因此，研究者们一直在寻求更有效的方法来应对这些挑战。近年来，虽然已经提出了几种随机组合梯度算法，但这些方法在效率和可扩展性上仍有不足，尤其在处理大规模的组合优化问题时。论文作者包括来自浙江大学、大连理工大学和浙江大学医学院的学者。他们提出了一种新的异步近似梯度算法，该算法旨在克服现有方法的局限性，提高在大规模数据集上的计算效率和收敛性能。异步特性意味着不同计算节点可以独立并行地更新模型参数，而不必等待其他节点完成，这大大加快了整体计算过程。关键点在于，这种算法利用了近似梯度来处理组合优化问题中的非光滑部分，同时保持了分布式计算环境下的高效性。它通过减少通信开销和提高计算效率，使得处理大规模问题成为可能。论文中可能详细讨论了算法的设计原理、实现步骤、收敛分析以及与其他方法的比较实验，展示了新算法在实际应用中的优越性。此外，论文还可能包含了数值实验，以验证所提算法在各种实际问题上的表现，如深度学习模型的训练、图信号处理或推荐系统中的协同过滤等。这些实验结果将为进一步优化算法提供理论依据，并为未来的研究提供方向。这篇研究论文对机器学习和优化社区具有重要意义，因为它提出了一种改进的算法，有望在解决大规模组合优化问题时实现更快的收敛速度和更好的计算效率，从而推动相关领域的进步。

Asynchronous Proximal Stochastic Gradient Algorithm for Composition

Optimization Problems

Pengfei Wang,

1, 2

Risheng Liu,

Nenggan Zheng,

1∗

Zhefeng Gong

Qiushi Academy for Advanced Studies, Zhejiang University, Hangzhou, Zhejiang, China

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China

International School of Information Science & Engineering, Dalian University of Technology, Liaoning, China

Department of Neurobiology, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China

{pfei, zng, zfgong}@zju.edu.cn, rsliu@dlut.edu.cn

Abstract

In machine learning research, many emerging applications

can be (re)formulated as the composition optimization prob-

lem with nonsmooth regularization penalty. To solve this

problem, traditional stochastic gradient descent (SGD) algo-

rithm and its variants either have low convergence rate or

are computationally expensive. Recently, several stochastic

composition gradient algorithms have been proposed, how-

ever, these methods are still inefﬁcient and not scalable to

large-scale composition optimization problem instances. To

address these challenges, we propose an asynchronous par-

allel algorithm, named Async-ProxSCVR, which effectively

combines asynchronous parallel implementation and variance

reduction method. We prove that the algorithm admits the

fastest convergence rate for both strongly convex and gen-

eral nonconvex cases. Furthermore, we analyze the query

complexity of the proposed algorithm and prove that linear

speedup is accessible when we increase the number of pro-

cessors. Finally, we evaluate our algorithm Async-ProxSCVR

on two representative composition optimization problems in-

cluding value function evaluation in reinforcement learn-

ing and sparse mean-variance optimization problem. Exper-

imental results show that the algorithm achieves signiﬁcant

speedups and is much faster than existing compared methods.

Introduction

Composition optimization problem proposed recently by

Wang et al. (Wang, Fang, and Liu 2014) arises in many

important applications including reinforcement learning

(Wang, Liu, and Tang 2017), statistical learning (Hinton and

Roweis 2003), risk management (Shapiro, Dentcheva, and

Ruszczy

nski 2009), and multi-stage stochastic programming

(Shapiro, Dentcheva, and Ruszczy

nski 2009). In this paper,

we study the ﬁnite-sum scenario for the regularized com-

position optimization problem whose objective function is

the composition of two ﬁnite-sum functions plus a possibly

nonsmooth regularization term, i.e.,

min

xPR

Hpxq “ fpxq ` hpxq, (1)

∗

Corresponding author

 2019, Association for the Advancement of Artiﬁcial

where

fpxq “

i“1

j“1

pxqq, (2)

where G

: R

Ñ R

, F

: R

Ñ R, and

f : R

Ñ R are continuously differentiable functions.

We denote Gpxq :“

j“1

pxq, y :“ Gpxq, and

F pyq :“

i“1

pyq. And we call Gpxq the inner func-

tion and F pyq the outer function. Often f pxq is used as

the empirical risk approximation to the composition of two

expected-value function E

pxqq. The regularization

h : R

Ñ RYt`8u is an extended real-valued closed con-

vex but possibly nonsmooth function, which is often used to

constrain the capacity of the hypothesis space.

In general, problem (1) is substantially more chal-

lenging than its non-composition counterpart, i.e., empiri-

cal risk minimization (ERM) problem (Friedman, Hastie,

and Tibshirani 2001) with the ﬁnite-sum form fpxq “

1{n

i“1

pxq. On the one hand, the composition objec-

tive is nonlinear with respect to the joint distribution of data

indices pi, jq, which leads to the difﬁculty of an unbiased

sampling for estimating the full gradient ∇fpxq. On the

other hand, the two ﬁnite-sums structure causes unprece-

dented computational challenges for traditional stochastic

gradient methods in solving problem (1). For example, to

apply stochastic gradient descent (SGD) method, we need

to compute pBG

pxqq

∇F

pGpxqq in each iteration. The in-

ner function Gpxq is computationally expensive with per-

iteration queries proportional to n

To solve problem (1), Wang et al. (Wang, Fang, and

Liu 2014; Wang, Liu, and Tang 2017) ﬁrst propose a class

of stochastic compositional gradient descent methods, i.e.,

SCGD and ASC-PG, which are based on the random eval-

uations of Gpxq, F pyq and their gradients with a low per-

iteration cost. However, SCGD and ASC-PG suffer from

low convergence rate because of the variance induced by

the random samplings. In recent years, variance reduction

based methods have been proposed to improve the con-

vergence rate for the composition optimization problem

(1). Some algorithms employ stochastic variance reduc-

tion gradient (SVRG) method to improve SCGD, includ-

ing Compositional-SVRG-1(Lian, Wang, and Liu 2017),

Compositional-SVRG-2 (Lian, Wang, and Liu 2017), and

下载后可阅读完整内容，剩余7页未读，立即下载

weixin_38736721

粉丝: 3

异步 proximal 随机梯度算法：解决组合优化问题的新途径

Asynchronous Methods for Deep Reinforcement Learning

pso_yspso_linwpso_sapso_matlab.zip

A note on asynchronous multi-exponentiation algorithm using binary representation

徐聪-Asynchronous Federated Optimization1

Algorithm for reinforcement learning.zip（解压即可，无密码）

Asynchronous Http and WebSocket Client library for Java .zip

Resilient asynchronous HN control designed for discrete-time Markov jump systems: Static output feedback case

ultraopt:Distributed Asynchronous Hyperparameter Optimization better than HyperOpt. 比HyperOpt更强的分布式异步超参优化库

Simulation and Synthesis Techniques for Asynchronous FIFO Design with Asynchronous Pointer Comparisons.pdf

藏经阁-Glint_An Asynchronous Parameter Server for Spark.pdf

最新资源