108:30 — HSODF: A Homogeneous Framework for Second-Order Methods

  • Chang He, Shanghai University of Finance And Economics
  • Yuntian Jiang, Shanghai University of Finance And Economics
  • Chuwen Zhang, Shanghai University of Finance And Economics
  • Dongdong Ge, Shanghai Jiao Tong University
  • Bo Jiang, Shanghai University of Finance And Economics
  • Yinyu Ye, Stanford University

We introduce a homogeneous second-order descent framework (HSODF) for nonconvex and convex optimization based on the generalized homogeneous model (GHM). In comparison to the Newton steps, the GHM can be solved by extremal symmetric eigenvalue procedures and thus grant an advantage in ill-conditioned problems. We show that HSODF is able to recover some well-known second-order methods, such as trust-region methods and gradient regularized methods while maintaining comparable iteration complexity bounds. We propose specific variants in the framework. Finally, we show HSODM can be used in reinforcement learning, linear systems and interior point methods.

209:00 — Descent properties of an Anderson accelerated gradient method with restarting

Anderson Acceleration (AA) is a popular acceleration technique to enhance the convergence of fixed-point iterations. The analysis of AA approaches typically focuses on the convergence behavior of a corresponding fixed-point residual, while the behavior of the underlying objective function values along the accelerated iterates is currently not well understood. In this talk, we present new results showing that AA with restarting is a local descent method which can decrease the objective function faster than the gradient method. These insights provide a novel perspective on the convergence analysis of AA in nonconvex optimization. Numerical experiments are conducted to illustrate our theoretical findings.

309:30 — ** CANCELLED ** Online Learning Guided Quasi-Newton Methods: Improved Global Non-asymptotic Guarantees

Quasi-Newton (QN) methods are popular iterative algorithms known for their superior practical performance compared to Gradient Descent (GD)-type methods. However, the existing theoretical results for this class of algorithms do not sufficiently justify their advantage over GD-type methods. In this talk, we discuss our recent efforts to address this issue. Specifically, in the strongly convex setting, we propose the first “globally” convergent QN method that achieves an explicit “non-asymptotic superlinear” rate. We show that the rate presented for our method is provably faster than GD after at most $O(d)$ iterations, where $d$ is the problem dimension. Additionally, in the convex setting, we present an accelerated variant of our proposed method that provably outperforms the accelerated gradient method and converges at a rate of $O(\min\{1/k^2, \sqrt{d \log k }/ k^{2.5}\})$, where $k$ is the number of iterations. To attain these results, we diverge from conventional approaches and construct our QN methods based on the Hybrid Proximal Extragradient (HPE) framework and its accelerated variants. Furthermore, a pivotal algorithmic concept underpinning our methodologies is an online learning framework for updating the Hessian approximation matrices. Specifically, we relate our method's convergence rate to the regret of a specific online convex optimization problem in the matrix space and choose the sequence of Hessian approximation matrices to minimize its overall regret.

410:00 — Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees

  • Nachuan Xiao, National University of Singapore
  • Xiaoyin Hu, Hangzhou City University
  • Xin Liu, Academy Of Mathematics And Systems Science, Chinese Academy Of Sciences
  • Kim Chuan Toh, National University of Singapore

In this talk, we present a comprehensive study on the convergence properties of Adam-family methods for nonsmooth optimization, especially in the training of nonsmooth neural networks. We introduce a novel two-timescale framework that adopts a two-timescale updating scheme, and prove its convergence properties under mild assumptions. We prove that under mild conditions, any cluster point of the sequence generated by our proposed framework is a stationary point of the objective function in the sense of the conservative field. Furthermore, we establish that under mild conditions with randomly chosen initial points and stepsizes, almost surely, any cluster point of the sequence is a Clarke stationary point of the objective function, independent of the chosen conservative field. Our proposed framework encompasses various popular Adam-family methods, providing convergence guarantees for these methods in training nonsmooth neural networks. Furthermore, we develop stochastic subgradient methods that incorporate gradient clipping techniques for training nonsmooth neural networks with heavy-tailed noise. Through our framework, we prove that under heavy-tailed evaluation noises that are only assumed to be integrable, our proposed gradient clipping methods conform to the proposed framework. As a result, the convergence properties of these gradient clipping methods directly follow those established for our proposed framework under mild conditions. Extensive numerical experiments demonstrate the high efficiency and robustness of our proposed methods.