Rie Johnson and Tong Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Advances
in Neural Information Processing Systems 26, 2013.
Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith
Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D’Oliveira, Salim El Rouayheb,
David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid
Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri
Joshi, Mikhail Khodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint,
Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi,
Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha
Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han
Yu, and Sen Zhao. Advances and Open Problems in Federated Learning. CoRR abs/1912.04977, 2019.
Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, and Ananda Theertha
Suresh. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning. In Proceedings of the International
Conference on Machine Learning 1 Pre-Proceedings (ICML 2020), 2020.
Ahmed Khaled, Konstantin Mishchenko, and Peter Richtárik. Tighter Theory for Local SGD on Identical and
Heterogeneous Data. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and
Statistics, volume 108, 2020.
Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, and Sebastian U. Stich. A Unified Theory of
Decentralized SGD with Changing Topology and Local Updates. In Proceedings of the International Conference on
Machine Learning 1 Pre-Proceedings (ICML 2020), 2020.
Harold J Kushner, George Yin, and Harold J Kushner. Stochastic Approximation and Recursive Algorithms and
Applications. 2003.
Jason D. Lee, Qihang Lin, Tengyu Ma, and Tianbao Yang. Distributed stochastic variance reduced gradient methods
by sampling extra data with replacement. 18, 2017.
Laurent Lessard, Benjamin Recht, and Andrew Packard. Analysis and Design of Optimization Algorithms via Integral
Quadratic Constraints. SIAM Journal on Optimization, 26(1), 2016.
Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated
optimization in heterogeneous networks. In Proceedings of Machine Learning and Systems 2020.
Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. On the convergence of FedAvg on non-iid
data. In International Conference on Learning Representations, 2020.
L. O. Mangasarian. Parallel Gradient Distribution in Unconstrained Optimization. SIAM Journal on Control and
Optimization, 33(6), 1995.
Horia Mania, Xinghao Pan, Dimitris Papailiopoulos, Benjamin Recht, Kannan Ramchandran, and Michael I. Jordan.
Perturbed Iterate Analysis for Asynchronous Stochastic Optimization. SIAM Journal on Optimization, 27(4), 2017.
Ryan Mcdonald, Mehryar Mohri, Nathan Silberman, Dan Walker, and Gideon S. Mann. Efficient large-scale distributed
training of conditional maximum entropy models. In Advances in Neural Information Processing Systems 22, 2009.
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient
learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial
Intelligence and Statistics, volume 54, 2017.
Konstantin Mishchenko, Eduard Gorbunov, Martin Takáč, and Peter Richtárik. Distributed Learning with Compressed
Gradient Differences. CoRR abs/1901.09269, 2019.
A.S. Nemirovski and D. B. Yudin. Problem complexity and method efficiency in optimization. 1983.
Yurii Nesterov. Lectures on Convex Optimization. 2018.
Reese Pathak and Martin J. Wainwright. FedSplit: An algorithmic framework for fast federated optimization. In
NeurIPS 2020, 2020.
13