114:00 — Activity Identification in Non-smooth Optimization: A Set-Valued Operator Perspective

In the last ten years, the concept of partial smoothness has played a crucial role in various areas such as local convergence analysis, sensitivity analysis, and algorithm acceleration. However, one significant limitation of partial smoothness is its applicability only to functions in finite dimensions. In this presentation, I aim to address this limitation by introducing an extension of partial smoothness from functions to set-valued operators.
By extending partial smoothness to set-valued operators, we can achieve a more comprehensive understanding and analysis of these operators. This extension enables us to gain an intuitive geometric interpretation of finite activity identification in proximal operator splitting algorithms. Specifically, we can identify the subsets of the domain where the set-valued operator exhibits finite activity, which has important implications for the convergence behavior and computational efficiency of proximal operator splitting algorithms.
Additionally, this extension allows us to provide an upper bound for the finite activity property, whereas existing results have only considered the lower bound. By obtaining both upper and lower bounds, we can obtain a more complete characterization of the finite activity property in proximal operator splitting algorithms. This comprehensive understanding can lead to improved algorithm design and performance in practical optimization problems.
Furthermore, the extension of partial smoothness to set-valued operators also allows us to derive identification results for degenerate problems. Degeneracy often arises in optimization problems when certain constraints or variables become redundant or highly correlated. By incorporating the concept of partial smoothness into the analysis of degenerate problems, we can gain insights into the identification of critical points and the behavior of optimization algorithms in such scenarios.
In summary, the extension of partial smoothness from functions to set-valued operators opens up new avenues for analysis and understanding in optimization. It provides a geometric interpretation of finite activity identification, enables the derivation of upper bounds, and facilitates the analysis of degenerate problems. These insights can contribute to the development of more efficient and effective optimization algorithms in both theory and practice.

214:30 — Goldstein modulus in Lipschitz Minimization

In 1977, Goldstein proposed a simple conceptual procedure for minimizing Lipschitz functions. From each current iterate x, we move a fixed step length in the direction opposite to the "Goldstein subgradient" - the shortest convex combination of gradients at points within that step length of x. How to choose the step length is unclear, and in any case, the Goldstein subgradient is not computable in practice. However, recent clever approximation schemes have resulted in practical versions with complexity bounds, and even near-linear convergence on large classes of objectives. We introduce a robust measure of the slope of a Lipschitz function, based on Goldstein subgradients. By exploring and leveraging linear growth of this “Goldstein modulus” around minimizers, we propose a conceptual subgradient algorithm with adapted radius selection, and thus explain the phenomenon of near-linear convergence of subgradient methods.

315:00 — Convergence of heavy-ball SGD in the nonconvex case: a novel time window-based analysis

  • Junwen Qiu, The Chinese University of Hong Kong, Shenzhen
  • Bohao Ma, The Chinese University of Hong Kong, Shenzhen
  • Andre Milzarek, The Chinese University of Hong Kong, Shenzhen

In this talk, we propose a novel time window-based analysis technique to investigate the convergence behavior of the stochastic heavy-ball method (SHB) in nonconvex settings. Despite its popularity, the asymptotic convergence properties of SHB remain less understood in nonconvex scenarios. This is primarily due to the absence of sufficient descent and challenges in controlling stochastic errors in an almost sure sense. To address these challenges, we analyze the potential approximate descent of SHB across specific time windows - rather than examining the behavior between consecutive iterates as done in classical studies. This time window-based approach simplifies the convergence analysis and enables us to establish the first iterate convergence result for SHB under the Kurdyka-Lojasiewicz (KL) property. Depending on the underlying step size dynamics and the KL exponent, we further characterize the local convergence rates of SHB.

415:30 — Error bounds, PL condition, and quadratic growth for weakly convex functions, and linear convergences of proximal point methods

  • Feng Yi Liao, University of California, San Diego, Electrical And Computer Engineering Department
  • Lijun Ding, University of Wisconsin Madison
  • Yang Zheng, University of California, San Diego, Electrical And Computer Engineering Department

Many machine learning problems lack strong convexity properties. Fortunately, recent studies have revealed that first-order algorithms also enjoy linear convergences under various weaker regularity conditions. While the relationship among different regularity conditions for convex and smooth functions is well understood, it is not the case for the nonsmooth setting. In this talk, we go beyond convexity and smoothness, and clarify the connections among common regularity conditions (including strong convexity, restricted secant inequality, subdifferential error bound, Polyak-Łojasiewic inequality, and quadratic growth) in the class of weakly convex functions. Our analysis is simple and easy to follow. We use the notion of slope and Ekeland’s variational principle as the key tools to establish the relationships among those common regularity conditions. In addition, we present a simple and modular proof for the linear convergence of the proximal point method (PPM) for convex (possibly nonsmooth) optimization using these regularity conditions. Our convergence results require weaker conditions while having simpler proofs than many existing characterizations. Linear convergence also holds when the subproblems of PPM are solved inexactly with a proper control of inexactness. We expect practical applications of these fast linear convergences of PPM to large-scale, sparse conic optimization with the connection to the augmented Lagrangian method.