1 — 08:30 — HSODF: A Homogeneous Framework for Second-Order Methods
We introduce a homogeneous second-order descent framework (HSODF) for nonconvex and convex optimization based on the generalized homogeneous model (GHM). In comparison to the Newton steps, the GHM can be solved by extremal symmetric eigenvalue procedures and thus grant an advantage in ill-conditioned problems. We show that HSODF is able to recover some well-known second-order methods, such as trust-region methods and gradient regularized methods while maintaining comparable iteration complexity bounds. We propose specific variants in the framework. Finally, we show HSODM can be used in reinforcement learning, linear systems and interior point methods.
2 — 09:00 — Descent properties of an Anderson accelerated gradient method with restarting
Anderson Acceleration (AA) is a popular acceleration technique to enhance the convergence of fixed-point iterations. The analysis of AA approaches typically focuses on the convergence behavior of a corresponding fixed-point residual, while the behavior of the underlying objective function values along the accelerated iterates is currently not well understood. In this talk, we present new results showing that AA with restarting is a local descent method which can decrease the objective function faster than the gradient method. These insights provide a novel perspective on the convergence analysis of AA in nonconvex optimization. Numerical experiments are conducted to illustrate our theoretical findings.
3 — 09:30 — ** CANCELLED ** Online Learning Guided Quasi-Newton Methods: Improved Global Non-asymptotic Guarantees
Quasi-Newton (QN) methods are popular iterative algorithms known for their superior practical performance compared to Gradient Descent (GD)-type methods. However, the existing theoretical results for this class of algorithms do not sufficiently justify their advantage over GD-type methods. In this talk, we discuss our recent efforts to address this issue. Specifically, in the strongly convex setting, we propose the first “globally” convergent QN method that achieves an explicit “non-asymptotic superlinear” rate. We show that the rate presented for our method is provably faster than GD after at most
4 — 10:00 — Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees
In this talk, we present a comprehensive study on the convergence properties of Adam-family methods for nonsmooth optimization, especially in the training of nonsmooth neural networks. We introduce a novel two-timescale framework that adopts a two-timescale updating scheme, and prove its convergence properties under mild assumptions. We prove that under mild conditions, any cluster point of the sequence generated by our proposed framework is a stationary point of the objective function in the sense of the conservative field. Furthermore, we establish that under mild conditions with randomly chosen initial points and stepsizes, almost surely, any cluster point of the sequence is a Clarke stationary point of the objective function, independent of the chosen conservative field. Our proposed framework encompasses various popular Adam-family methods, providing convergence guarantees for these methods in training nonsmooth neural networks. Furthermore, we develop stochastic subgradient methods that incorporate gradient clipping techniques for training nonsmooth neural networks with heavy-tailed noise. Through our framework, we prove that under heavy-tailed evaluation noises that are only assumed to be integrable, our proposed gradient clipping methods conform to the proposed framework. As a result, the convergence properties of these gradient clipping methods directly follow those established for our proposed framework under mild conditions. Extensive numerical experiments demonstrate the high efficiency and robustness of our proposed methods.