1 — 08:30 — Moreau envelope based reformulation and algorithm for bilevel programs
Bilevel programming has emerged as a valuable tool for hyperparameter selection, a central concern in machine learning. In a recent study by Ye et al. (2023), a value function-based difference of convex algorithm was introduced to address bilevel programs. This approach proves powerful when dealing with scenarios where the lower-level problem exhibits convexity in both the upper-level and lower-level variables. In this work, we expand the range of applications, now requiring convexity only in the lower-level variables of the lower-level program. We present a new single-level difference of weakly convex reformulation based on the Moreau envelope of the lower-level problem. We further develop a sequentially convergent Inexact Proximal Difference of Weakly Convex Algorithm (iP-DwCA). Additionally, for scenarios where all problem functions are smooth, we develop a single-loop gradient-based algorithm.
2 — 09:00 — On first-order methods for stochastic bilevel optimization
We study first-order algorithms for solving Bilevel Optimization with access to stochastic gradient oracles. The problem has been extensively studied in recent years; the main technical challenge is to keep track of lower-level solutions in response to the changes in the upper-level variables. Subsequently, all existing approaches tie their analyses to a genie algorithm that knows lower-level solutions and, therefore, need not query any points far from them. We consider a dual question to such approaches: suppose we have an oracle, which we call y*-aware, that returns an estimate of the lower-level solution alongside the first-order gradient estimators locally unbiased within a reliable region. We study the complexity of finding stationary points with the proposed y*-aware oracle; we derive the matching upper and lower bounds on the iteration complexities with access to first-order y*-aware oracles without/with additional stochastic smoothness assumptions. Our upper bound result can be naturally extended to the standard unbiased oracles without the loss in the complexity, improving the best-known upper bounds by a polynomial factor. Furthermore, our upper bound suggests that the first-order methods are not necessarily slower than second-order methods under the fair stochastic smoothness comparison. On the other hand, our lower bound implies that any approach that simulates y*-aware oracle must suffer the same lower bounds, and we propose the conjecture on the tightness of our results.
3 — 09:30 — Relaxation methods for pessimistic bilevel optimization
When the lower-level optimal solution set-valued mapping of a bilevel optimization problem is not single-valued, we are faced with an ill-posed problem, which gives rise to the optimistic and pessimistic bilevel optimization problems, as tractable algorithmic frameworks. However, solving the pessimistic bilevel optimization problem is far more challenging than the optimistic one. Relaxation methods have appeared to be some of the most efficient techniques to solve the optimistic bilevel optimization problem in its Karush-Kuhn-Tucker (KKT) reformulation or the corresponding more general mathematical program with complementarity constraints. Inspired by this success, this work studies their potential in the context of the pessimistic bilevel optimization problems. To proceed, we consider the KKT reformulation of a pessimistic bilevel optimization problem, where all the functions involved are at least continuously differentiable, the lower-level problem is convex, and the Slater constraint qualification is satisfied. We then construct theoretical results ensuring that a sequence of global/local optimal solutions (resp. stationarity points) of the relaxations converges to the corresponding points for the KKT reformulation of the pessimistic bilevel optimization. The results are accompanied by some technical proofs ensuring that the relaxation algorithms are well-defined. In this talk, we will provide a brief overview of the results of this work, as well as some numerical results on the implementation of the algorithms.
4 — 10:00 — Theoretical smoothing frameworks for general nonsmooth bilevel problems
The optimal value function approach provides a useful reformulation of the bilevel optimization problem, but its utility is often hindered by the nonsmoothness of the value function even in cases when the associated lower-level function is smooth. In this talk, we present two smoothing strategies for the value function associated with lower-level functions that are not necessarily smooth but are Lipschitz continuous. The first method employs a quadratic regularization approach for partially convex lower-level functions, while the second utilizes entropic regularization for general lower-level objective functions. Meanwhile, the property known as gradient consistency is crucial in ensuring that a designed smoothing algorithm is globally subsequentially convergent to stationary points of the equivalent reformulation of the bilevel problem. With this in mind, we show that the proposed smooth approximations satisfy the gradient consistent property under certain conditions on the lower-level function.