10
低秩表示中,往往不需要假定全局的强凸性,而只需要
在稀疏度或秩不超过给定数值的集合上具有强凸性就可
以了,这称为受限性强凸(Restricted Strong Convexity)
[28]。
四、结语
虽然优化算法在基础理论方面已经较为完善了,但
是在应用层面对更快、复杂度更低的优化算法的需求还
是永无止境的。本文对近年来一阶算法的进展只做了非
常简要的点评。未来理论方面的突破应当在非凸优化方
面,而技术上需全面地依赖随机算法才能有效地处理高
维和海量的数据。
致谢
本文受国家自然科学基金资助(编号:61625301、
61731018)。方聪和李欢对本文亦有贡献。
References
[1] Naman Agarwal, Zeyuan Allen-Zhu, Brian Bullins, Elad Hazan, and
Tengyu Ma. Finding approximate local minima faster than gradient descent.
In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of
Computing, pages 1195–1199. ACM, 2017.
[2] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran. Proximal alternating
minimization and projection methods for nonconvex problems: An
approach based on the Kurdyka-Łojasiewicz inequality. Mathematics of
Operations Research,35:438–457, 2010.
[3] Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding
algorithm for linear inverse problems. SIAM Journal on Imaging Sciences,
2(1):183–202, 2009.
[4] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, 2 edition,
1999.
[5] Dimitri P. Bertsekas and John N. Tsitsiklis. Parallel and Distributed
Computation: Numerical Methods. Prentice-Hall, 1989.
[6] Stephen Boyd and Lieven Vandenberghe. Convex Optimization.
Cambridge University Press,2004.
[7] J. Cai, E.J. Candès, and Z. Shen. A singular value thresholding algorithm
for matrix completion. SIAM Journal on Optimization, 20(4):1956–1982,
2010.
[8] E.J. Candès, Michael B Wakin, and Stephen P Boyd. Enhancing sparsity by
reweighted ℓ
minimization. Journal of Fourier Analysis and
Applications, 14(5-6):877–905, 2008.
[9] Tianyi Chen, Georgios B. Giannakis, Tao Sun, and Wotao Yin. LAG: Lazily
aggregated gradient for communication-efficient distributed learning.
arXiv preprint, arXiv:1805.09965, 2018.
[10] Edwin K. P. Chong and Stanislaw H. Zak. An Introduction to Optimization.
John Wiley & Sons, Inc., 4 edition, 2013.
[11] P. Domingos. A few useful things to know about machine learning.
Communications of the ACM, 55(10):78–87, 2012.
[12] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods
for online learning and stochastic optimization. Journal of Machine
Learning Research, 12(July):2121–2159, 2011.
[13] John Duchi and Yoram Singer. Efficient online and batch learning using
forward backward splitting. Journal of Machine Learning Research,
10(December):2899–2934, 2009.
[14] Cong Fang, Feng Cheng, and Zhouchen Lin. Faster and non-ergodic o(1/k)
stochastic alternating direction method of multipliers. In Advances in
Neural Information Processing Systems, 2017.
[15] Cong Fang and Zhouchen Lin. Parallel asynchronous stochastic variance
reduction for nonconvex optimization. In AAAI Conference on Artificial
Intelligence, 2017.
[16] S. Foucart and H. Rauhut. A Mathematical Introduction to Compressive
Sensing. Springer, 2013.
[17] Marguerite Frank and Philip Wolfe. An algorithm for quadratic
programming. Naval Research Logistics Quarterly, 3(1-2):95–110, 1956.
[18] Rong Ge, Furong Huang, Chi Jin, and Yang Yuan. Escaping from saddle
points–online stochastic gradient for tensor decomposition. In Conference
on Learning Theory, 2015.
[19] Robert Hannah and Wotao Yin. On unbounded delays in asynchronous
parallel fixed-point algorithms. Journal of Scientific Computing,
76(1):299–326, 2018.
[20] Bingsheng He, Min Tao, and Xiaoming Yuan. Alternating direction method
with Gaussian back substitution for separable convex programming. SIAM
Journal on Optimization, 22(2):313–340, 2012.
[21] M. Hong, Z.Q. Luo, and M. Razaviyayn. Convergence analysis of
alternating direction method of multipliers for a family of nonconvex
problems. SIAM Journal on Optimization, 26(1):337–364, 2016.
[22] Martin Jaggi. Revisiting Frank-Wolfe: Projection-free sparse convex
optimization. In International Conference on Machine Learning, pages
427–435, 2013.
[23] Martin Jaggi, Marek Sulovsk, et al. A simple algorithm for nuclear norm
regularized problems. In International Conference on Machine Learning,
pages 471–478, 2010.
[24] D. Jakovetic, J. Xavier, and J. Moura. Fast distributed gradient methods.
IEEE Transactions on Automatic Control, 59:1131–1146, 2014.
[25] Chi Jin, Rong Ge, Praneeth Netrapalli, Sham M Kakade, and Michael I
Jordan. How to escape saddle points efficiently. In International
Conference on Machine Learning, pages 1724–1732, 2017.
[26] R. Johnson and T. Zhang. Accelerating stochastic gradient descent using
predictive variance reduction. In Advances in Neural Information
Processing Systems, 2013.
[27] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic
optimization. In ICLR, 2015.
[28] M.J. Lai and W. Yin. Augmented
and nuclear-norm models with a
globally linearly convergent algorithm. SIAM Journal on Imaging Sciences,
pp. 183-202, 6(2):1059–1091, 2013.
[29] Guanghui Lan, Soomin Lee, and Yi Zhou. Communication-efficient
algorithms for decentralized and stochastic optimization. arXiv preprint,
arXiv:1701.03961, 2017.
[30] Huan Li and Zhouchen Lin. Accelerated proximal gradient methods for
nonconvex programming. In Advances in Neural Information Processing
Systems, pages 612–620, 2015.
[31] Huan Li and Zhouchen Lin. Accelerated alternating direction method of
multipliers: an optimal O(1/K) nonergodic analysis. arXiv preprint,
arXiv:1608.06366, 2017.
[32] Z. Lin, R. Liu, and Z. Su. Linearized alternating direction method with
adaptive penalty for low rank representation. In Advances in Neural
Information Processing Systems, pages 612–620,2011.
[33] Zhouchen Lin, Risheng Liu, and Huan Li. Linearized alternating direction
method with parallel splitting and adaptive penalty for separable convex
programs in machine learning. Machine Learning, 99(2):287–325, 2015.
[34] Zhouchen Lin and Hongyang Zhang. Low-Rank Models in Visual Analysis:
Theories, Algorithms, and Applications. Academic Press, 2017.
[35] C. Lu, Z. Lin, and S. Yan. Smoothed low rank and sparse matrix recovery
by iteratively reweighted least squared minimization. IEEE Transactions
on Image Processing, 24(2):646–654, 2015.
[36] Canyi Lu, Jiashi Feng, Shuicheng Yan, and Zhouchen Lin. A unified
alternating direction method of multipliers by majorization minimization.
submitted to IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2017.
[37] Canyi Lu, Huan Li, Zhouchen Lin, and Shuicheng Yan. Fast proximal
linearized alternating direction method of multiplier with parallel splitting.
In AAAI Conference on Artificial Intelligence, pages 739–745, 2016.
[38] Michael W. Mahoney. Randomized algorithms for matrices and data.
Foundations and Trends in Machine Learning, 3(2):123–224, 2011.
[39] Julien Mairal. Incremental majorization-minimization optimization with
application to large-scale machine learning. SIAM Journal on Optimization,
25(2):829–855, 2015.
[40] Tomoya Murata and Taiji Suzuki. Doubly accelerated stochastic variance
reduced dual averaging method for regularized empirical risk minimization.
In Advances in Neural Information Processing Systems, pages 608–617,
2017.
[41] A. Nedic and A. Ozdaglar. Distributed subgradient methods for multi-agent
optimization. IEEE Transactions on Automatic Control, 54(1):48–61, 2009.
[42] Y. Nesterov. A method of solving a convex programming problem with
convergence rate 1/
. Soviet Mathematics Doklady, 27(2):372–376,
1983.
[43] Yurii Nesterov, Introductory Lectures on Convex Optimization: A Basic
Course. Springer, 2004.
[44] Hua Ouyang, Niao He, Long Q. Tran, and Alexander Gray. Stochastic
alternating direction method of multipliers. In International Conference on
Machine Learning, 2013.
[45] Y. Ouyang, Y. Chen, G. Lan, and E. Pasiliao. An accelerated linearized
alternating direction method of multipliers. SIAM Journal on Imaging