1 — 08:30 — An asynchronous parallel bundle method
In this talk we present a fully asynchronous proximal bundle method for solving non-smooth, convex optimization problems. The algorithm can be used as a drop-in replacement for classic bundle methods, i.e., the function must be given by a first-order oracle for computing function values and subgradients. The algorithm allows for an arbitrary number of master problem processes computing new candidate points and oracle processes evaluating functions at those candidate points. These processes share information by communication with a single supervisor process that resembles the main loop of a classic bundle method. All processes run in parallel and no explicit synchronization step is required. Instead, the asynchronous and possibly outdated results of the oracle computations can be seen as an inexact function oracle. Hence, we show the convergence of our method under weak assumptions very similar to inexact and incremental bundle methods. In particular, we show how the algorithm learns important structural properties of the functions to control the inaccuracy induced by the asynchronicity automatically such that overall convergence can be guaranteed. Our tests on relaxations of some multi-commodity flow problems show the advantageousness of our approach if there are some slower processes.
2 — 09:00 — Geometric characterizations of Lipschitz stability for convex optimization problems
In this paper, we mainly study tilt stability and Lipschitz stability of convex optimization problems. Our characterizations are geometric and fully computable in many important cases. As a result, we apply our theory to the group Lasso problem and the nuclear norm minimization problem and reveal that the Lipschitz stability of the solution mapping in these problems is automatic whenever the solution mapping is single-valued.
3 — 09:30 — First-Order Methods for Constrained Optimization Problems Under Distributional Uncertainties: Empirical Evaluations and Theory
Nonconvex models, like deep learning, have excelled in recent machine learning tasks. Practical training of such models involves meeting multiple requirements, including fairness, robustness, and accuracy-at-top. To address these requirements, training can be performed by solving a constrained optimization problem, with each requirement represented as a constraint. Moreover, distributional shifts from the training phase to the deployment (test) phase might occur. Practitioners commonly tackle this constrained optimization problem by solving the Lagrangian Relaxation and fine-tuning the Lagrange multipliers. Although easily implemented in training frameworks by altering the loss function, Lagrangian Relaxation for nonconvex problems may fail to converge, even with close initialization to the optimal solution. In this work, we investigate existing alternative approaches for training large models under the presence of constraints (and possible distribution shifts). We specifically assess the performance of Exact Penalty (EP), Dual Ascent (DA), and the Method of Multipliers (MM). Like Lagrangian relaxation, these methods seamlessly integrate into existing frameworks, such as PyTorch. We investigate convergence to stationarity, stability around optimal solutions, and hyperparameter tuning for these approaches both in theory and with empirical evaluations. Our extensive numerical experiments demonstrate that, across a diverse range of scenarios, these alternative approaches consistently outperform Lagrangian Relaxation which is commonly used by practitioners.
4 — 10:00 — Proximal Algorithms for Gaussian variational inference via JKO in the Bures-Wasserstein Space
Variational inference (VI) has emerged as a tractable alternative to computationally demanding Monte Carlo Markov Chain (MCMC) methods. VI seeks to approximate a target distribution by an element of a tractable family of distributions. Of key interest in statistics and machine learning is Gaussian VI, which approximates the target by minimizing the Kullback-Leibler (KL) divergence to over the space of Gaussians. In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB-GVI) algorithm to solve Gaussian VI. Our approach exploits the composite structure of the KL divergence, which can be written as the sum of a smooth term (the potential) and a non-smooth term (the entropy) over the Bures-Wasserstein (BW) space of Gaussians endowed with the Wasserstein distance. FB–GVI hence leverages celebrated ideas from the literature on composite and non-smooth optimization. A key insight in this work is that the JKO operator for the entropy, when restricted to the BW space, admits a closed form, and hence leads to an implementable (stochastic) forward-backward (or proximal
gradient) algorithm for Gaussian VI. In turn, it yields new computational guarantees for Gaussian VI under a variety of standard assumptions. Specifically, we obtain state-of-the-art convergence guarantees when is log-smooth and log-concave, as well as the first convergence guarantees to first-order stationary solutions when is only log-smooth.