ISMP 2024

1 — 08:30 — HSODF: A Homogeneous Framework for Second-Order Methods

Chang He, Shanghai University of Finance And Economics
Yuntian Jiang, Shanghai University of Finance And Economics
Chuwen Zhang, Shanghai University of Finance And Economics
Dongdong Ge, Shanghai Jiao Tong University
Bo Jiang, Shanghai University of Finance And Economics
Yinyu Ye, Stanford University

We introduce a homogeneous second-order descent framework (HSODF) for nonconvex and convex optimization based on the generalized homogeneous model (GHM). In comparison to the Newton steps, the GHM can be solved by extremal symmetric eigenvalue procedures and thus grant an advantage in ill-conditioned problems. We show that HSODF is able to recover some well-known second-order methods, such as trust-region methods and gradient regularized methods while maintaining comparable iteration complexity bounds. We propose specific variants in the framework. Finally, we show HSODM can be used in reinforcement learning, linear systems and interior point methods.

2 — 09:00 — Descent properties of an Anderson accelerated gradient method with restarting

Wenqing Ouyang, The Chinese University of Hongkong (Shenzhen)
Yang Liu, University of Oxford
Andre Milzarek, The Chinese University of Hong Kong, Shenzhen

Anderson Acceleration (AA) is a popular acceleration technique to enhance the convergence of fixed-point iterations. The analysis of AA approaches typically focuses on the convergence behavior of a corresponding fixed-point residual, while the behavior of the underlying objective function values along the accelerated iterates is currently not well understood. In this talk, we present new results showing that AA with restarting is a local descent method which can decrease the objective function faster than the gradient method. These insights provide a novel perspective on the convergence analysis of AA in nonconvex optimization. Numerical experiments are conducted to illustrate our theoretical findings.

3 — 09:30 — CANCELLED Online Learning Guided Quasi-Newton Methods: Improved Global Non-asymptotic Guarantees

Ruichen Jiang, UT Austin
Qiujiang Jin, UT Austin
Aryan Mokhtari, The University of Texas at Austin

Quasi-Newton (QN) methods are popular iterative algorithms known for their superior practical performance compared to Gradient Descent (GD)-type methods. However, the existing theoretical results for this class of algorithms do not sufficiently justify their advantage over GD-type methods. In this talk, we discuss our recent efforts to address this issue. Specifically, in the strongly convex setting, we propose the first “globally” convergent QN method that achieves an explicit “non-asymptotic superlinear” rate. We show that the rate presented for our method is provably faster than GD after at most $O (d)$ iterations, where $d$ is the problem dimension. Additionally, in the convex setting, we present an accelerated variant of our proposed method that provably outperforms the accelerated gradient method and converges at a rate of $O (min {1 / k^{2}, \sqrt{d \log k} / k^{2.5}})$ , where $k$ is the number of iterations. To attain these results, we diverge from conventional approaches and construct our QN methods based on the Hybrid Proximal Extragradient (HPE) framework and its accelerated variants. Furthermore, a pivotal algorithmic concept underpinning our methodologies is an online learning framework for updating the Hessian approximation matrices. Specifically, we relate our method's convergence rate to the regret of a specific online convex optimization problem in the matrix space and choose the sequence of Hessian approximation matrices to minimize its overall regret.

4 — 10:00 — Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees

Nachuan Xiao, National University of Singapore
Xiaoyin Hu, Hangzhou City University
Xin Liu, Academy Of Mathematics And Systems Science, Chinese Academy Of Sciences
Kim Chuan Toh, National University of Singapore

In this talk, we present a comprehensive study on the convergence properties of Adam-family methods for nonsmooth optimization, especially in the training of nonsmooth neural networks. We introduce a novel two-timescale framework that adopts a two-timescale updating scheme, and prove its convergence properties under mild assumptions. We prove that under mild conditions, any cluster point of the sequence generated by our proposed framework is a stationary point of the objective function in the sense of the conservative field. Furthermore, we establish that under mild conditions with randomly chosen initial points and stepsizes, almost surely, any cluster point of the sequence is a Clarke stationary point of the objective function, independent of the chosen conservative field. Our proposed framework encompasses various popular Adam-family methods, providing convergence guarantees for these methods in training nonsmooth neural networks. Furthermore, we develop stochastic subgradient methods that incorporate gradient clipping techniques for training nonsmooth neural networks with heavy-tailed noise. Through our framework, we prove that under heavy-tailed evaluation noises that are only assumed to be integrable, our proposed gradient clipping methods conform to the proposed framework. As a result, the convergence properties of these gradient clipping methods directly follow those established for our proposed framework under mild conditions. Extensive numerical experiments demonstrate the high efficiency and robustness of our proposed methods.

Schedule

Thursday, July 25, 2024 08:30 - 10:30 — Parallel Session

ThA135 — From First- to High-order Methodologies: Recent Advances and Trends III

1 — 08:30 — HSODF: A Homogeneous Framework for Second-Order Methods

2 — 09:00 — Descent properties of an Anderson accelerated gradient method with restarting

3 — 09:30 — CANCELLED Online Learning Guided Quasi-Newton Methods: Improved Global Non-asymptotic Guarantees

4 — 10:00 — Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees

1 — 08:30 — HSODF: A Homogeneous Framework for Second-Order Methods

2 — 09:00 — Descent properties of an Anderson accelerated gradient method with restarting

3 — 09:30 — ** CANCELLED ** Online Learning Guided Quasi-Newton Methods: Improved Global Non-asymptotic Guarantees

4 — 10:00 — Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees

3 — 09:30 — CANCELLED Online Learning Guided Quasi-Newton Methods: Improved Global Non-asymptotic Guarantees