1 — 16:20 — Data-Driven Performance Guarantees for Classical and Learned Optimizers
We introduce a data-driven approach to analyze the performance of continuous optimization algorithms using generalization guarantees from statistical learning theory. We study classical and learned optimizers to solve families of parametric optimization problems. We build generalization guarantees for classical optimizers, using a sample convergence bound, and for learned optimizers, using the Probably Approximately Correct (PAC)-Bayes framework. To train learned optimizers, we use a gradient-based algorithm to directly minimize the PAC-Bayes upper bound. Numerical experiments in signal processing, control, and meta-learning showcase the ability of our framework to provide strong generalization guarantees for both classical and learned optimizers given a fixed budget of iterations. For classical optimizers, our bounds are much tighter than those that worst-case guarantees provide. Meanwhile, our generalization bounds for learned optimizers outperform the empirical outcomes observed in their non-learned counterparts.
2 — 16:50 — Verification of First-Order Methods for Parametric Quadratic Optimization
We introduce a numerical framework to verify the finite step convergence of first-order methods for parametric convex quadratic optimization. We formulate the verification problem as a mathematical optimization problem where we maximize a performance metric (e.g., fixed-point residual at the last iteration) subject to constraints representing proximal algorithm steps (e.g., linear system solutions, projections, or gradient steps). Our framework is highly modular because we encode a wide range of proximal algorithms as variations of two primitive steps: affine steps and element-wise maximum steps. Compared to standard convergence analysis and performance estimation techniques, we can explicitly quantify the effects of warm-starting by directly representing the sets where the initial iterates and parameters live. We show that the verification problem is NP-hard, and we construct strong semidefinite programming relaxations using various constraint tightening techniques. Numerical examples in nonnegative least squares, network utility maximization, Lasso, and optimal control show a significant reduction in pessimism of our framework compared to standard worst-case convergence analysis techniques.
3 — 17:20 — Amortized optimization for optimal transport and LLM attacks
Amortized optimization methods provide fast solvers by predicting approximate solutions to optimization problems. This talk covers two recent advancements using amortization to significantly speed up the solvers of non-trivial optimization problems arising in the fields of optimal transport (OT) and large language model (LLM) attacks. Computational optimal transport problems may involve solving three nested optimization problems, each of which amortization can help with: 1) the solution map from the measures to the primal/dual OT solution (Meta OT: https://arxiv.org/abs/2206.05262), 2) the computation of the c-transform or Fenchel conjugate (amortized conjugates: https://arxiv.org/abs/2210.12153), and 3) the computation of geodesics and Lagrangian (minimum-action) paths/costs (Lagrangian OT: https://openreview.net/pdf?id=myb0FKB8C9). Adding amortization to the standard solvers in these OT settings significantly improves the runtime and deployment time of the methods. These faster amortized solutions to the Fenchel conjugate and geodesic/Lagrangian paths are of potential more general interest in other settings bottlenecked by numerical solutions to them. Beyond these optimal transport applications, we will also discuss the prompt optimization problems arising in adversarial attacks on LLMs (AdvPrompter: https://arxiv.org/abs/2404.16873). Here, amortization enables us to attain state-of-the-art results on the standard AdvBench dataset, that also transfer to closed-source black-box LLM APIs. The fast amortized predictions then enable us to generate a synthetic dataset of adversarial examples which an LLM can be fine-tuned on to make it more robust against jailbreaking attacks while maintaining performance.