1 — 08:30 — Is building first-order simulation oracles really worth it?
Stochastic simulations or black-box noisy (non-convex) problems often do not provide direct gradient observations. However, conventional wisdom asserts that having gradients should lead to better convergence and complexity results. This knowledge has motivated work towards building first-order simulation oracles that compute function sample path derivatives by employing infinitesimal perturbation analysis or likelihood ratio methods. These first-order simulation oracles can be costly and inflexible, adding an eligibility requirement to deem a function suitable for derivative-based models. The natural question is, is it worth the trouble building the first-order capability in simulation oracles and will that guarantee better complexity? We find that the answer is not necessarily. Concretely, we show that unless the function sample paths are themselves smooth with uniformly bounded Lipschitz constants, there is no gain in having a first-order simulation oracle. That is, stochastic derivative-free algorithms that do not need or use direct gradients will lead to the same complexity results that one would attain with stochastic derivative-based algorithms. This finding is important in many cases where the smoothness of the stochastic process is clearly lost. An example is when the objective functions are quantiles or conditional expectations rendering the sample path functions piecewise linear or step functions. In such settings, our results imply that a derivative-free solver will be just as efficient as a derivative-based solver. The main vehicle for our analysis and result is common random numbers and the consequent reduced variance in gradient and function estimates.
2 — 09:00 — Adaptive stochastic subspace descent
Many optimization problems, such as hyper-parameter learning and PDE constrained optimization, have costly objective functions for which the gradient is unavailable. We propose a novel descent method, Adaptive Stochastic Subspace Descent (A-SSD), which uses randomized directional derivative approximates combined with an adaptive Polyak step-size policy. We show that the step-size policy is the optimal solution to a truncated linear model over the random subspace. Finally, we demonstrate empirically that A-SSD is competitive against commonly used derivative-free methods such as BOBYQA and conjugate directions for functions in dimensions exceeding 100.
3 — 09:30 — Self-Correcting DFO Algorithm: An Attempt at a Convergence Rate
We will discuss steps complexity analysis (in terms of function evaluations) of model-based trust region methods. The one method for which complex has been established previously is from the paper "Trust-Region Methods Without Using Derivatives: Worst Case Complexity and the NonSmooth Case" by Garmanjani, Judice and Vicente. It employs a criticality step where instead of trying to make progress the algorithm's focus is shifted to generating a fully linear model. In contrast we study alternative methods where the algorithm always tries to make progress and accepts steps if such progress happens to occur. One of the methods, makes on model improvement iteration, when necessary, following methodology by Powell. The other method maintains geometry via self-correcting property following the paper "Self-correcting geometry in model-based algorithms for derivative-free unconstrained optimization" by Scheinberg and Toint. We will also provide numerical comparison of these methods.
4 — 10:00 — ** CANCELLED ** Derivative-free stochastic optimization via adaptive sampling strategies
We present a novel derivative-free optimization framework for solving unconstrained stochastic optimization problems. Many problems in fields ranging from simulation optimization to reinforcement learning involve settings where only stochastic function values are obtained via an oracle with no available gradient information, necessitating the usage of derivative-free optimization methodologies. Our approach involves estimating gradients using stochastic function evaluations and integrating adaptive sampling techniques to control the accuracy in these stochastic approximations. We consider various gradient estimation techniques including finite difference, Gaussian smoothing, randomized coordinate finite difference, and randomized subspace finite difference methods. We provide theoretical convergence guarantees for our framework and analyze the worst-case iteration and sample complexities associated with each gradient estimation method. Finally, we demonstrate the empirical performance of the methods on logistic regression and nonlinear least squares problems.