116:20 — A Sampling-based Progressive Hedging Algorithm for Stochastic Programming

The progressive hedging algorithm (PHA) is a cornerstone in tackling large-scale stochastic programming challenges. However, its traditional implementation is hindered by several limitations, including the requirement to solve all scenario subproblems in each iteration, reliance on an explicit probability distribution, and a convergence process that is highly sensitive to the choice of penalty parameters. This paper introduces a sampling-based PHA that aims to overcome these limitations. Our approach employs a dynamic selection process for the number of scenario subproblems solved per iteration. It incorporates adaptive sequential sampling for determining sample sizes, stochastic conjugate subgradient methods for direction finding, and a line-search technique to update the dual variables. Experimental results demonstrate that this novel algorithm not only addresses the bottlenecks of the conventional PHA but also potentially surpasses its scalability, representing a substantial improvement in the field of stochastic programming.

216:50 — Generalized adaptive partition-based method for two-stage stochastic linear programs: geometric oracle and analysis

Adaptive Partition-based Methods (APM) are numerical methods that solve, in particular, two-stage stochastic linear problems (2SLP).
We say that a partition of the uncertainty space is \emph{adapted} to the current first stage control $\check x$ if we can
aggregate scenarios while conserving the true value of the expected recourse cost at $\check x$.
The core idea of APM is to iteratively
constructs an \emph{adapted} partition to all past tentative first stage controls.

Relying on the normal fan of the dual admissible set, we give a necessary and sufficient condition for a partition to be adapted even for non-finite distribution, and provide a geometric method to obtain an adapted partition.
Further, by showing the connection between APM and the L-shaped algorithm, we
prove convergence and complexity bounds of the APM methods. We discussed the fixed recourse case, with elements to forgo this assumption.

317:20 — Stochastic Bregman Proximal Gradient Method Revisited: Kernel Conditioning and Painless Variance Reduction

We investigate Bregman proximal gradient (BPG) methods for solving nonconvex composite stochastic optimization problems. Instead of the standard gradient Lipschitz continuity (GLC) assumption, the objective function only satisfies a smooth-adaptability assumption w.r.t. some kernel function. An in-depth analysis of the stationarity measure is made in this paper, where we reveal an interesting fact that the widely adopted Bregman proximal gradient mapping in the existing works may not correctly depict the near stationarity of the solutions. To resolve this issue, a new Bregman proximal gradient mapping is proposed and analyzed in this paper. Second, a thorough analysis is made on the sample complexities of the stochastic Bregman proximal gradient methods under both the old and the newly proposed gradient mappings are analyzed. Note that the absence of GLC disables the standard analysis of the stochastic variance reduction techniques, existing stochastic BPG methods only obtain an $O(\epsilon^{-2})$ sample complexity under the old gradient mapping, we show that such a limitation in the existing analyses mainly comes from the insufficient exploitation of the kernel's properties. By proposing a new kernel-conditioning regularity assumption on the kernel, we show that a simple epoch bound mechanism is enough to enable all the existing variance reduction techniques for stochastic BPG methods. Combined with a novel probabilistic argument, we show that there is a high probability event conditioning on which $O(\sqrt{n}\epsilon^{-1})$ sample complexity for the finite-sum stochastic setting can be derived for the old gradient mapping. Moreover, with a novel adaptive step size control mechanism, we also show an $O(\sqrt{n}L_h(X)\epsilon^{−1})$ complexity under the new gradient mapping. Worst case instances are designed to show the tightness of the analysis.