114:00 — Improved lower bounds for stochastic optimization under Markovian sampling

Stochastic optimization with Markovian sampling allows the sampling scheme following a Markov chain, which generalizes the usual i.i.d. sampling assumption. This problem encompasses various applications that range from asynchronous distributed optimization to online reinforcement learning. Given any Markovian sampling process, we give a lower bound on the sample complexity of finding $\epsilon$-approximate stationary points for any first-order methods. In particular, for samples drawn from a countable Markov process, any algorithm that accesses smooth, non-convex functions through queries to a stochastic gradient oracle with bounded noise, requires at least $\epsilon^{-2}$ samples to find an $\epsilon$-approximate stationary point. The lower bound matches the upper bound of a recently proposed algorithm, MC-SAG, hence establishing its min-max optimality.

214:30 — Designing surrogate functions for reinforcement learning

We propose a series of theoretical results related to designing surrogate functions for Reinforcement Learning. We cover links between reinforcement learning and maximum likelihood, discuss choices of Bregman divergences when doing mirror descent, and introduce accelerated methods in function space.

This talk aims at explaining the various possible choices and their consequences when designing off-policy RL algorithms, for instance for reinforcement learning from human feedback.

315:00 — The Exactness of the L1 Penalty Function for a Class of Mathematical Programs with Generalized Complementarity Constraints

  • Xin Liu, Academy Of Mathematics And Systems Science, Chinese Academy Of Sciences

In a Mathematical Program with Generalized Complementarity Constraints (MPGCC), complementarity relationships are imposed between each pair of variable blocks. MPGCC includes the traditional Mathematical Program with Complementarity Constraints (MPCC) as a special case. On account of the disjunctive feasible region, MPCC and MPGCC are generally difficult to handle. The L1 penalty method, often adopted in computation, opens a way of circumventing the difficulty. Yet it remains unclear about the exactness of the L1 penalty function, namely, whether there exists a sufficiently large penalty parameter so that the penalty problem shares the optimal solution set with the original one. In this paper, we consider a class of MPGCCs that are of multi-affine objective functions. This problem class finds applications in various fields, e.g., the multi-marginal optimal transport problems in many-body quantum physics and the pricing problems in network transportation. We first provide an instance from this class, the exactness of whose L1 penalty function cannot be derived by existing tools. We then establish the exactness results under rather mild conditions. Our results cover those existing ones for MPCC and apply to multi-block contexts.

415:30 — Low-rank optimization on Tucker tensor varieties

In the realm of tensor optimization, low-rank tensor decomposition, particularly Tucker decomposition, stands as a pivotal technique for reducing the number of parameters and for saving storage. We embark on an exploration of Tucker tensor varieties—the set of tensors with bounded Tucker rank—in which the geometry is notably more intricate than the well-explored geometry of matrix varieties. We give an explicit parametrization of the tangent cone of Tucker tensor varieties and leverage its geometry to develop provable gradient-related line-search methods for optimization on Tucker tensor varieties. The search directions are computed from approximate projections of antigradient onto the tangent cone, which circumvents the calculation of intractable metric projections. To the best of our knowledge, this is the first work concerning geometry and optimization on Tucker tensor varieties. In practice, low-rank tensor optimization suffers from the difficulty of choosing a reliable rank parameter. To this end, we incorporate the established geometry and propose a Tucker rank-adaptive method that is capable of identifying an appropriate rank during iterations while the convergence is also guaranteed. Numerical experiments on tensor completion with synthetic and real-world datasets reveal that the proposed methods are in favor of recovering performance over other state-of-the-art methods. Moreover, the rank-adaptive method performs the best across various rank parameter selections and is indeed able to find an appropriate rank.