1 — 14:00 — Generalized-Smooth Nonconvex Optimization is As Efficient As Smooth Nonconvex Optimization
Various optimal gradient-based algorithms have been developed for smooth nonconvex optimization. However, many nonconvex machine learning problems do not belong to the class of smooth functions and therefore the existing algorithms are sub-optimal. Instead, these problems have been shown to satisfy certain generalized-smooth conditions, which have not been well understood in the existing literature. In this paper, we propose a notion of
2 — 14:30 — Stochastic Optimization under Hidden Convexity
In this work, we consider constrained stochastic optimization problems under hidden convexity, i.e., those that admit a convex reformulation via non-linear (but invertible) map c(⋅). A number of non-convex problems ranging from optimal control, revenue and inventory management, to convex reinforcement learning all admit such a hidden convex structure. Unfortunately, in the majority of applications considered, the map c(⋅) is unavailable or implicit; therefore, directly solving the convex reformulation is not possible. On the other hand, the stochastic gradients with respect to the original variable are often easy to obtain. Motivated by these observations, we examine the basic projected stochastic (sub-) gradient methods for solving such problems under hidden convexity. We provide the first sample complexity guarantees for global convergence in smooth and non-smooth settings. Additionally, in the smooth setting, we improve our results to the last iterate convergence in terms of function value gap using the momentum variant of projected stochastic gradient descent.
3 — 15:00 — Convergence of Douglas--Rachford Splitting and Primal-Dual Hybrid Gradient for Non-Monotone, Non-Smooth Problems
Despite the widespread use of splitting methods, their convergence analysis has predominantly been limited to convex/monotone settings. This presentation delves into the convergence properties of two such methods, Douglas-Rachford splitting (DRS) and the primal-dual hybrid gradient (PDHG) method, in the absense of monotonicity and smoothness. Specifically, we introduce the notion of semimonotonicity and establish conditions under which DRS and PDHG exhibit global convergence, even when dealing with the sum of two semimonotone operators. Through illustrative examples, we showcase the versatility of our theoretical framework in addressing problems emerging in optimization and reinforcement learning.
4 — 15:30 — On the Lower Bound of Minimizing Polyak-Lojasiewicz functions
Polyak-Lojasiewicz (PL) (Polyak, 1963) condition is a weaker condition than the strong convexity but suffices to ensure a global convergence for the Gradient Descent algorithm. In this paper, we study the lower bound of algorithms using first-order oracles to find an approximate optimal solution. We show that any first-order algorithm requires at least