1 — 14:00 — Robustly Stable Accelerated Gradient Methods
We consider the problem of minimizing a strongly convex smooth function where the gradients are subject to additive worst-case deterministic errors that are square-summable. We study the trade-offs between the convergence rate and robust-ness to gradient errors when designing the parameters of a first-order algorithm. We focus on a general class of momentum methods (GMM) with constant stepsize and two momentum parameters which can recover gradient descent (GD), Nesterov's accelerated gradient (NAG), the heavy-ball (HB) and the triple momentum methods (TMM) as special cases. We measure the robustness of an algorithm in terms of the cumulative suboptimality over the iterations normalized by the squared 2 norm of the gradient errors. This quantity can be interpreted as the (squared) 2 gain of a dynamical system that represents the GMM iterations where the input is the gradient error sequence and the output is a weighted distance to the optimum. For quadratic objectives, we compute the 2 gain explicitly leveraging its representation as the H∞ norm of the GMM system in the frequency domain and construct gradient errors that lead to worst-case performance explicitly. We also study the stability of GMM with respect to multiplicative errors by characterizing the structured real and stability radius of the GMM system through their connections to the H∞ norm. This allows us to compare GD, HB, NAG methods in terms of robustness, and argue that HB is not as robust as NAG despite being the fastest in terms of the rate. We then develop the robustly stable heavy ball method that can be faster than NAG while being at the best robustness level possible. We also propose the robustly stable gradient descent that is the fastest version of GD with constant stepsize while being at the best robustness level. Finally, we extend our framework to general strongly convex smooth objectives, providing non-asymptotic rate results for inexact GMM methods and bounds on the 2 gain where we can choose the GMM parameters to systematically trade off the rate to robustness in a computationally tractable framework. Finally, we discuss extensions of our results and algorithms to the stochastic optimization setting where gradient errors are stochastic and to min-max optimization settings.
2 — 14:30 — Advanced, Adaptive and Flexible Algorithms for Decentralized Optimization
The problem of optimizing an objective function by employing a decentralized procedure using multiple agents in a connected network has gained significant attention over the last decades. This is due to the wide applicability of decentralized optimization to many important science and engineering applications such as, optimal control, machine learning, robotics, sensor networks, and smart grids. Decentralized optimization problems come in diverse shapes and forms, and could have very different characteristics. In this talk, we discuss novel flexible approaches for solving decentralized optimization problems that adapt to problem characteristics. We present two unifying algorithmic frameworks that recover popular algorithms as special cases. We discuss the rationale behind our proposed techniques, convergence in expectation and complexity guarantees for our algorithms, and present encouraging numerical results.
3 — 15:00 — Stochastic Interior-Point method for Inequality constrained optimization
An interior-point algorithm is proposed to solve the inequality constrained optimization problems where the objective and constraints functions are continuously differentiable. Such problems appear, for example, in the applications of machine learning problems with fairness constraints and physics constraints. It is assumed that only the stochastic objective gradient can be evaluated. The algorithm is unique in its use of inner neighborhoods of the feasible region---defined by a positive and vanishing neighborhood-parameter sequence---in which the iterates are forced to remain. For bound-constrained problems, it is shown that with a careful balance between the barrier, step-size, and neighborhood sequences, the proposed algorithm satisfies convergence guarantees in both deterministic and stochastic settings, showing promising experimental results. The convergence results are also extended to nonlinear-constrained problems under specific assumptions.
4 — 15:30 — ** CANCELLED ** Nonsmooth Optimization on a Finer Scale: Bounded Local Subgradient Variation Perspective
Nonsmooth optimization problems are ubiquitous in industrial and machine learning applications, arising from the need to address tasks such as resource allocation, threat detection, and model training. Within the realm of mathematical optimization, nonsmooth problems stand as some of the most challenging; yet they often offer the possibility of developing efficient algorithms with provable guarantees. The complexity of these problems, encompassing both lower and upper bounds, has historically typically been examined under generic assumptions, bounding the growth of the objective functions by assuming Lipschitz continuity of either the objective itself or its gradient. In this talk, I will argue that these traditional assumptions defining classes of nonsmooth optimization problems inadequately capture local properties of problems that may make them amenable to more efficient algorithms. I will introduce a notion of local bounded variation of the (sub)gradient and discuss how, under this notion, we can obtain a more fine-grained characterization of the complexity of nonsmooth problems. One consequence of these results is that, contrary to prior belief, the complexity of nonsmooth optimization problems, such as those with piecewise linear objectives with polynomially many pieces, can be improved using parallelization even in high dimensions, either if the problem is unconstrained or if the function is allowed to be queried outside the feasible set.