1 — 14:00 — ** CANCELLED ** Stochastic Programming Without Assuming a Probability Distribution
The presentation will begin by describing smooth bootstrap applied to confidence intervals for the optimality gap associated with a solution to a stochastic program. This extends work where stochastic programming is used based on a sample from an unknown probability distribution. It there is time, the second part of the talk will describe a software architecture for computing solutions with upper and lower bounds in an asynchronous parallel environment.
2 — 14:30 — Bayesian Reinforcement Learning with Limited Evaluations and Nonlinear Objectives
This talk studies Bayesian Reinformcemnet Learning problems where the number of evaluations is constrained. We consider a versatile framework where (1) the utility function can exhibit nonlinearity with respect to the actions and (2) each action is defined by a set of (continuous or categorical) features within a solution space constrained by equalities and inequalities. This framework accommodates a wide array of practical scenarios across diverse application domains. We present efficient methods for computing a forward-looking measurement policy, known as the knowledge-gradient (KG) policy or this sequential learning problem setting. We evaluate the performance of our computational approach and optimal learning policy against popular benchmark policies from the sequential Bayesian ranking and selection and Bayesian Bandits.
3 — 15:00 — Planning Adaptive Experiments with Model-Predictive Control
Implementing adaptive experimentation methods in the real world often encounters a multitude of operational difficulties, including batched/delayed feedback, non-stationary environments, and constraints on treatment allocations. To improve the flexibility of adaptive experimentation, we propose a Bayesian, optimization-based framework founded on model-predictive control (MPC) for the linear contextual bandit setting. While we focus on simple regret minimization, the framework can flexibly incorporate multiple objectives along with constraints, batches, personalized and non-personalized policies, as well as predictions of future context arrivals. Most importantly, it maintains this flexibility while guaranteeing improvement over non-adaptive A/B testing across all time horizons, and outperforms standard policies such as Thompson Sampling in an empirical benchmark study with non-stationary rewards. Overall, this framework offers a way to guide adaptive designs across the varied demands of modern large-scale experiments.
4 — 15:30 — Sample efficient estimation of the transition kernels of controlled Markov chains
We will present estimation bounds on non-parametric estimates of the transition kernels of Controlled Markov chains (CMC’s). CMC’s are a natural choice for modelling various industrial and medical processes, and are also relevant to reinforcement learning (RL). Therefore, learning the transition dynamics of CMC’s in a sample efficient manner is an important question. We will attempt to answer this question when the underlying state-control space is both finite and infinite. Under finiteness, we will develop a Probably Approximately Correct (PAC) bound for estimation error. We will explore the additional challenges when the state-control space is infinite, and tackle them using techniques from adaptive estimation. At the end, we will also posit some open questions.