114:00 — Accelerating Gradient Descent

This talk will discuss several recent works establishing provably faster convergence rates for gradient descent in smooth convex optimization via semidefinite programming and computer-assisted analysis techniques (i.e., performance estimation problems). First, we will discuss recent advances from allowing nonconstant stepsize policies with frequent long steps potentially violating descent. We manage this in analysis by considering the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. Doing so, we prove a $O(1/T^{1.0564})$ convergence rate, beating the classic $O(1/T)$ rate simply by periodically including longer steps (no momentum needed).
Secondarily, additional results that exactly characterize the benefits of iterate averaging and extrapolation for gradient descent will be presented. We will show iterate averaging provably offers no benefit in terms of worst case guarantees. In contrast, we show a trivially simple extrapolation step at the end of gradient descent (costing a single scalar-vector multiply) can improve the worst-case rate of convergence by the same amount as $O(sqrt{T/log(T)})$ additional gradient descent iterations. Numerical results supporting the benefit of this simple, cheap extrapolation trick for a range of gradient methods will be presented as well.

214:30 — Performance estimation for SDP hierarchies for polynomial optimization on the hypercube

We present a way to estimate the worst-case performance of an SDP hierarchy for polynomial optimization on the hypercube. The SDP hierarchy in question is based on the Schmuedgen Positivstellensatz. Some error bounds for this hierarchy are known from recent work by Laurent and Slot [Optimization Letters 17:515-530, 2023]. We show how to obtain similar error bounds in a novel way, but also how to obtain error bounds that may be computed a priori via SDP.

315:00 — ** CANCELLED ** The exact worst-case convergence rate of the alternating direction method of multipliers

Recently, semidefinite programming performance estimation has been employed as a strong tool for the worst-case performance analysis of first order methods. In this talk, I derive new non-ergodic convergence rates for the alternating direction method of multipliers (ADMM) by using performance estimation. I also study the linear and R-linear convergence of ADMM in terms of dual objective. I establish that ADMM enjoys a global linear convergence rate if and only if the dual objective satisfies the Polyak–Łojasiewicz (PŁ) inequality in the presence of strong convexity. In addition, I give an explicit formula for the linear convergence rate factor. Moreover, I study the R-linear convergence of ADMM under two scenarios.

415:30 — Acceleration by Stepsize Hedging

 Can we accelerate the convergence of gradient descent without changing the algorithm — just by optimizing stepsizes? Surprisingly, we show that the answer is yes. Our proposed Silver Stepsize Schedule optimizes strongly convex functions in $k^{\log_p 2} = k^{0.7864}$ iterations, where $p=1+\sqrt{2}$ is the silver ratio and $k$ is the condition number. This is intermediate between the textbook unaccelerated rate $k$ and the accelerated rate $\sqrt{k}$ due to Nesterov in 1983. The non-strongly convex setting is conceptually identical and leads to an analogously accelerated rate $\epsilon^{-\log_p 2} = \epsilon^{-0.7864}$. We conjecture and provide partial evidence that these rates are optimal among all possible stepsize schedules.

The Silver Stepsize Schedule is an explicit non-monotonic fractal. Why should such stepsizes help? The core intuition is “hedging” between individually suboptimal strategies — short steps and long steps — since bad cases for the former are good cases for the latter, and vice versa. Properly combining these stepsizes yields faster convergence due to the misalignment of worst-case functions. This talk is based on a line of work with Pablo Parrilo that originates from my 2018 Master’s Thesis — which established for the first time that judiciously chosen stepsizes can enable accelerated convex optimization. Prior to this thesis, the only such result was for the special case of quadratics, due to Young in 1953.