114:00 — Adaptivity in convex optimization beyond minimization

Bilevel optimization is a comprehensive framework that bridges single- and multi-objective optimization. It encompasses many general formulations, such as, but not limited to, standard nonlinear programs. This work demonstrates how elementary proximal gradient iterations can be used to solve a wide class of convex bilevel optimization problems without involving subroutines. Compared to and improving upon existing methods, ours (1) can handle a much wider class of problems, including both constraints and nonsmooth terms, (2) requires mere local Lipschitz gradient continuity of the differentiable terms without imposing any strong convexity assumptions, and (3) provides a systematic adaptive stepsize selection strategy that leads to large and nonmonotonic stepsize sequences while being insensitive to the choice of parameters.

214:30 — The Anytime Convergence of Stochastic Gradient Descent with Momentum: From a Continuous-Time Perspective

In this talk, we focus on the stochastic optimization problem and introduce a continuous-time perspective. We propose a stochastic first-order algorithm, called Stochastic Gradient Descent with Momentum (SGDM), and show that the trajectory of SGDM, despite its stochastic nature, converges to a deterministic second-order Ordinary Differential Equation (ODE) in L2-norm, as the stepsize goes to zero. The connection between the ODE and the algorithm results in delightful patterns in the discrete-time convergence analysis. More specifically, we develop convergence results for the ODE through a Lyapunov function, and translate the whole argument to the discrete-time case. This approach yields a novel anytime convergence guarantee for stochastic gradient methods. Our contribution is significant in that it better captures the convergence behavior across the entire trajectory of the algorithm, rather than at a single iterate.

315:00 — Efficient sparse probability measures recovery via Bregman gradient

  • Jianting Pan, The Chinese University of Hong Kong, Shenzhen
  • Ming Yan, The Chinese University of Hong Kong, Shenzhen

This paper presents an algorithm tailored for the efficient recovery of sparse probability measures incorporating l0-sparse regularization within the probability simplex constraint. Employing the Bregman proximal gradient method, our algorithm achieves sparsity by explicitly solving underlying subproblems. We rigorously establish the convergence properties of the algorithm, showcasing its capacity to converge to a local minimum with a convergence rate of O(1/k) under mild assumptions. To substantiate the efficacy of our algorithm, we conduct numerical experiments, offering a compelling demonstration of its efficiency in recovering sparse probability measure.

415:30 — Universal heavy-ball method for nonconvex optimization under Hölder continuous Hessians

We propose a new first-order method for minimizing nonconvex functions with Lipschitz continuous gradients and Hölder continuous Hessians. The proposed algorithm is a heavy-ball method equipped with two particular restart mechanisms. The algorithm enjoys an oracle complexity bound for finding a solution where the gradient norm is less than ε; the exponent for ε in the complexity bound depends on the Hölder exponent ν of Hessians. Our complexity result covers the classical bound for ν=0 (i.e., Lipschitz continuous gradients) and a state-of-the-art bound for ν=1 (i.e., Lipschitz continuous gradients and Hessians). Our algorithm is ν-independent and thus universal; it automatically achieves the complexity bound with the optimal ν without knowledge of the Hölder continuity. In addition, the algorithm does not require other problem-dependent parameters as input, including the gradient's Lipschitz constant or the target accuracy ϵ. Numerical results illustrate that the proposed method is promising.