1 — 08:30 — Distributed Composite Stochastic Mirror Descent for Stochastic Optimization
We study statistical estimation of high-dimensional sparse parameters over a network. The estimation problem is formulated as Stochastic Approximation. Assuming agents only have access to unbiased estimates of the gradients of their local cost functions, we propose composite stochastic optimization problem on each stage of a multistage algorithm; each stage being solved to a prescribed accuracy by the non-Euclidean composite distributed stochastic mirror descent method (DSMD) in the adapt than combine (ATC) and combine than adapt (CTA) forms. We show that, with high probability, the iterates generated by each agent linear converges during the first “preliminary” phase of the routine and is sublinear during the second “asymptotic” phase. Further, we characterize the transient time needed for DSMD to approach the asymptotic convergence rate. Numerical experiments demonstrate the tightness of the theoretical results.
2 — 09:00 — Stochastic Adaptive Regularization Method with Cubics - A High Probability Complexity Bound
We present a high probability complexity bound for a stochastic adaptive regularization method with cubics, also known as regularized Newton method. The method makes use of stochastic zeroth-, first- and second-order oracles that satisfy certain accuracy and reliability assumptions. Such oracles have been used in the literature by other stochastic adaptive methods, such as trust region and line search. These oracles capture many settings, such as expected risk minimization, stochastic zeroth-order optimization, and others. In this paper, we give the first high probability iteration bound for stochastic cubic regularization, and show that just as in the deterministic case, it is superior to other stochastic adaptive methods.
3 — 09:30 — Design Guidelines of Noise-Tolerant Optimization Methods
The development of nonlinear optimization algorithms capable of performing reliably in the presence of noise has garnered considerable attention lately. This work advocates for strategies to create noise-tolerant nonlinear optimization algorithms by adapting classical deterministic methods. These adaptations follow certain design guidelines described here, which make use of estimates of the noise level in the problem. The application of our methodology is illustrated by the development of a line search gradient projection method and a robust quasi-Newton method. The algorithms are tested on an engineering design problem, where the propagation of noise is unknown. It is shown that a new self-calibrated line search, noise-aware finite-difference techniques, and a lengthening L-BFGS-B method can be effective even in the high noise regime. Numerical experiments investigate the resiliency of key algorithmic components. Finally, convergence analysis of the proposed methods establishes convergence to a neighborhood of the solution.
4 — 10:00 — Robustly Learning Single-Index Models via Alignment Sharpness
We study the problem of learning Single-Index Models under the $L_2^2$ loss in the agnostic model. We give an efficient learning algorithm, achieving a constant factor approximation to the optimal loss, that succeeds under a range of distributions (including log-concave distributions) and a broad class of monotone and Lipschitz link functions. This is the first efficient constant factor approximate agnostic learner, even for Gaussian data and for any nontrivial class of link functions. Prior work for the case of unknown link function either works in the realizable setting or does not attain constant factor approximation. The main technical ingredient enabling our algorithm and analysis is a novel notion of a local error bound in optimization that we term alignment sharpness and that may be of broader interest.