116:20 — The Price of Adaptivity in Stochastic Convex Optimization

We prove impossibility results for adaptivity in non-smooth stochastic convex optimization.
Given a set of problem parameters we wish to adapt to, we define a “price of adaptivity” (PoA)
that, roughly speaking, measures the multiplicative increase in suboptimality due to uncertainty
in these parameters. When the initial distance to the optimum is unknown but a gradient norm
bound is known, we show that the PoA is at least logarithmic for expected suboptimality, and
double-logarithmic for median suboptimality. When there is uncertainty in both distance and
gradient norm, we show that the PoA must be polynomial in the level of uncertainty. Our lower
bounds nearly match existing upper bounds, and establish that there is no parameter-free lunch.

216:50 — Coin Sampling: Learning-Rate-Free Optimisation on the Space of Probability Measures

The task of sampling from a target probability distribution whose density is only known up to a normalisation constant is of fundamental importance to computational statistics and machine learning. There are various popular approaches to this task, including Markov chain Monte Carlo (MCMC) and variational inference (VI). More recently, there has been growing interest in developing hybrid sampling methods which combine the non-parametric nature of MCMC with the parametric approach used in VI. In particular, particle based variational inference (ParVI) methods approximate the target distribution using an ensemble of interacting particles, which are deterministically updated by iteratively minimising a metric such as the Kullback-Leibler divergence. Unfortunately, such methods invariably depend on hyperparameters such as the learning rate, which must be carefully tuned by practitioners in order to ensure convergence to the target distribution at a suitable rate.

In this talk, we introduce a suite of new sampling algorithms which are entirely learning rate free. Our approach leverages the perspective of sampling as an optimisation problem over the space of probability measures, and existing ideas from parameter free optimisation. We discuss how to establish the convergence of our algorithms under suitable assumptions on the target distribution. We then illustrate the performance of our approach on a range of numerical examples, including several high dimensional models and datasets, demonstrating comparable performance to existing state-of-the-art sampling algorithms, but with no need to tune a learning rate.

317:20 — Parameter-free projected gradient descent with application to contextual bandits with knapsacks

I will describe a fully parameter-free version of AdaGrad (norm), which is adaptive to the distance between the initialization and the optimum, and to the sum of the square norm of the subgradients. Dscribing an extension of our approach to stochastic optimization, I will highlight the utility of the approach for the problem of contextual bandits with knapsacks.