联邦学习加速随机梯度下降法

随机梯度下降

联邦学习

需积分: 14 61 浏览量更新于2024-07-12 收藏 1.7MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"FederatedAcceleratedStochasticGradientDescent - 提升联邦学习中随机梯度下降的效率与收敛速度" 在大规模机器学习应用中，利用分布式计算资源和去中心化的数据是至关重要的。传统的集中式学习方法面临的主要挑战之一是数据隐私和通信效率。为了解决这些问题，联邦学习（Federated Learning）应运而生，它允许在不共享原始数据的情况下进行模型协作训练。而随机梯度下降（Stochastic Gradient Descent, SGD）作为优化算法的基石，在联邦学习中扮演了核心角色。 Federated Accelerated Stochastic Gradient Descent（FedAc）是由Honglin Yuan和Tengyu Ma提出的一种新型优化方法，旨在加速联邦平均（Federated Averaging, FedAvg）的收敛速度并提高通信效率。FedAvg，也被称为局部SGD，是联邦学习中的基础算法，它让每个设备或工作节点在本地执行多次SGD迭代，然后将模型更新汇总到中心节点。 FedAc是第一个被证明可以对FedAvg进行有效加速的框架，对于强凸且光滑的函数，它能在使用M个工作节点时实现线性加速。与之前的最优FedAvg分析相比，如果允许大约O(M)轮同步，FedAvg能获得大约O(M^1/3)轮的改进。此外，当目标函数具有三阶光滑性时，FedAc提供了更强的性能保证。 FedAc的技术基础包括基于势能的扰动迭代分析，这是一种针对广义加速SGD的新型稳定性分析，以及在加速和稳定性之间进行策略性的权衡。这种创新的方法使得在保持算法稳定性的前提下，能够更有效地利用分布式系统的资源，从而在减少通信次数的同时加快收敛速度。 FedAc的提出对于解决联邦学习中的通信瓶颈问题具有重要意义。由于联邦学习中的设备通常具有有限的通信带宽，减少通信轮次意味着减少了对网络资源的需求，同时提高了整体的训练效率。这不仅有助于保护用户数据的隐私，还能够在有限的计算和通信资源下实现更快的模型训练，对实际应用有着重大的推动作用。通过深入理解和应用FedAc，我们可以设计出更加高效、适应性强的分布式机器学习系统，尤其是在医疗、金融、物联网等领域，这些领域往往对数据隐私有极高的要求，同时又需要快速准确的模型训练。FedAc的出现为联邦学习的研究开辟了新的道路，也为未来优化算法的发展提供了新的思考方向。

资源详情

资源推荐

Rie Johnson and Tong Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Advances

in Neural Information Processing Systems 26, 2013.

Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith

Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D’Oliveira, Salim El Rouayheb,

David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid

Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri

Joshi, Mikhail Khodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint,

Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi,

Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha

Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han

Yu, and Sen Zhao. Advances and Open Problems in Federated Learning. CoRR abs/1912.04977, 2019.

Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, and Ananda Theertha

Suresh. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning. In Proceedings of the International

Conference on Machine Learning 1 Pre-Proceedings (ICML 2020), 2020.

Ahmed Khaled, Konstantin Mishchenko, and Peter Richtárik. Tighter Theory for Local SGD on Identical and

Heterogeneous Data. In Proceedings of the Twenty Third International Conference on Artiﬁcial Intelligence and

Statistics, volume 108, 2020.

Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, and Sebastian U. Stich. A Uniﬁed Theory of

Decentralized SGD with Changing Topology and Local Updates. In Proceedings of the International Conference on

Machine Learning 1 Pre-Proceedings (ICML 2020), 2020.

Harold J Kushner, George Yin, and Harold J Kushner. Stochastic Approximation and Recursive Algorithms and

Applications. 2003.

Jason D. Lee, Qihang Lin, Tengyu Ma, and Tianbao Yang. Distributed stochastic variance reduced gradient methods

by sampling extra data with replacement. 18, 2017.

Laurent Lessard, Benjamin Recht, and Andrew Packard. Analysis and Design of Optimization Algorithms via Integral

Quadratic Constraints. SIAM Journal on Optimization, 26(1), 2016.

Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated

optimization in heterogeneous networks. In Proceedings of Machine Learning and Systems 2020.

Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. On the convergence of FedAvg on non-iid

data. In International Conference on Learning Representations, 2020.

L. O. Mangasarian. Parallel Gradient Distribution in Unconstrained Optimization. SIAM Journal on Control and

Optimization, 33(6), 1995.

Horia Mania, Xinghao Pan, Dimitris Papailiopoulos, Benjamin Recht, Kannan Ramchandran, and Michael I. Jordan.

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization. SIAM Journal on Optimization, 27(4), 2017.

Ryan Mcdonald, Mehryar Mohri, Nathan Silberman, Dan Walker, and Gideon S. Mann. Eﬃcient large-scale distributed

training of conditional maximum entropy models. In Advances in Neural Information Processing Systems 22, 2009.

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-eﬃcient

learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artiﬁcial

Intelligence and Statistics, volume 54, 2017.

Konstantin Mishchenko, Eduard Gorbunov, Martin Takáč, and Peter Richtárik. Distributed Learning with Compressed

Gradient Diﬀerences. CoRR abs/1901.09269, 2019.

A.S. Nemirovski and D. B. Yudin. Problem complexity and method eﬃciency in optimization. 1983.

Yurii Nesterov. Lectures on Convex Optimization. 2018.

Reese Pathak and Martin J. Wainwright. FedSplit: An algorithmic framework for fast federated optimization. In

NeurIPS 2020, 2020.

Amirhossein Reisizadeh, Aryan Mokhtari, Hamed Hassani, Ali Jadbabaie, and Ramtin Pedarsani. FedPAQ: A

Communication-Eﬃcient Federated Learning Method with Periodic Averaging and Quantization. In Proceedings of

the Twenty Third International Conference on Artiﬁcial Intelligence and Statistics. PMLR, 2020.

Herbert Robbins and Sutton Monro. A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22

(3), 1951.

Jonathan D. Rosenblatt and Boaz Nadler. On the optimality of averaging in distributed statistical learning. Information

and Inference, 5(4), 2016.

Ohad Shamir and Nathan Srebro. Distributed stochastic optimization and learning. In 2014 52nd Annual Allerton

Conference on Communication, Control, and Computing (Allerton).

Soeren Sonnenburg, Vojtech Franc, Elad Yom-Tov, and Michele Sebag. Pascal large scale learning challenge.

http://largescale.ml.tu-berlin.de/instructions/, 2008.

Sebastian U. Stich. Local SGD converges fast and communicates little. In International Conference on Learning

Representations, 2019a.

Sebastian U. Stich. Uniﬁed Optimal Analysis of the (Stochastic) Gradient Method. CoRR abs/1907.04232, 2019b.

Sebastian U. Stich and Sai Praneeth Karimireddy. The Error-Feedback Framework: Better Rates for SGD with

Delayed Gradients and Compressed Communication. CoRR abs/1909.05350, 2019.

Sebastian U Stich, Jean-Baptiste Cordonnier, and Martin Jaggi. Sparsiﬁed SGD with memory. In Advances in Neural

Information Processing Systems 31, 2018.

Vladimir Naumovich Vapnik. Statistical Learning Theory. 1998.

Jianyu Wang and Gauri Joshi. Cooperative SGD: A uniﬁed Framework for the Design and Analysis of Communication-

Eﬃcient SGD Algorithms. 2019.

Jianyu Wang, Vinayak Tantia, Nicolas Ballas, and Michael Rabbat. SlowMo: Improving communication-eﬃcient

distributed SGD with slow momentum. In International Conference on Learning Representations, 2020.

Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. TernGrad: Ternary gradients

to reduce communication in distributed deep learning. In Advances in Neural Information Processing Systems 30,

2017.

Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad

Shamir, and Nathan Srebro. Is Local SGD Better than Minibatch SGD? In Proceedings of the International

Conference on Machine Learning 1 Pre-Proceedings (ICML 2020), 2020.

Bin Yu and Karl Kumbier. Veridical data science. Proceedings of the National Academy of Sciences, 117(8), 2020.

Hao Yu and Rong Jin. On the computation and communication complexity of parallel SGD with dynamic batch sizes

for stochastic non-convex optimization. In Proceedings of the 36th International Conference on Machine Learning,

volume 97, 2019.

Hao Yu, Rong Jin, and Sen Yang. On the linear speedup analysis of communication eﬃcient momentum SGD for

distributed non-convex optimization. In Proceedings of the 36th International Conference on Machine Learning,

volume 97, 2019a.

Hao Yu, Sen Yang, and Shenghuo Zhu. Parallel Restarted SGD with Faster Convergence and Less Communication:

Demystifying Why Model Averaging Works for Deep Learning. In Proceedings of the AAAI Conference on Artiﬁcial

Intelligence, volume 33, 2019b.

Yuchen Zhang, John C. Duchi, and Martin J. Wainwright. Communication-eﬃcient algorithms for statistical

optimization. 14, 2013.

Yuchen Zhang, John Duchi, and Martin Wainwright. Divide and conquer kernel ridge regression: A distributed

algorithm with minimax optimal rates. 16, 2015.

剩余74页未读，继续阅读

三金samkam

粉丝: 372
资源: 7

联邦学习加速随机梯度下降法

Gradient-Descent

Fed SGD算法是谁提出的

联邦学习的参数聚合都有哪些方法

FedSGD算法特点

联邦学习模型聚合方面算法

联邦学习的参数聚合方法

与分布式优化类似的算法

Fedprox DP

federated_test_data = make_federated_data(emnist_test, sample_clients) len(federated_test_data), federated_test_data[0]的作用

Plugin 'FEDERATED' is disabled.解决办法 添加federated还是打不开

tensorflow-federated=0.19.0中，出现module 'tensorflow_federated.python.learning' has no attribute 'build_federated_weighted_averaging_process'

MySQL如何查询是否安装FEDERATED 存储引擎

federated learning mobile 开源

module 'tensorflow_federated' has no attribute 'utils'为什么出错

mysql8.0开启federated后无法启动服务

MYSQL创建FEDERATED表报错SQL 错误 [1286] [42000]: Unknown storage engine 'FEDERATED'

ModuleNotFoundError: No module named 'tensorflow_federated'

如何在MySQL中启用FEDERATED存储引擎？

FEDERATED 存储引擎，如何开通过

note plugin federated is disabled

最新资源

Plugin 'FEDERATED' is disabled.解决办法添加federated还是打不开