108:30 — On Foundations of Distributionally Robust Reinforcement Learning

Motivated by the need for a robust policy in the face of environment shifts between training and the deployment, we contribute to the theoretical foundation of distributionally robust reinforcement learning (DRRL). This is accomplished through a comprehensive modeling framework centered around distributionally robust Markov decision processes (DRMDPs). This framework obliges the decision maker to choose an optimal policy under the worst-case distributional shift orchestrated by an adversary. By unifying and extending existing formulations, we rigorously construct DRMDPs that embraces various modeling attributes for both the decision maker and the adversary. These attributes include adaptability granularity, exploring history-dependent, Markov, and Markov time-homogeneous decision maker and adversary dynamics. Additionally, we delve into the flexibility of shifts induced by the adversary, examining SA and S-rectangularity. Within this DRMDP framework, we investigate conditions for the existence or absence of the dynamic programming principle (DPP). From an algorithmic standpoint, the existence of DPP holds significant implications, as the vast majority of existing data and computationally efficiency RL algorithms are reliant on the DPP. To study its existence, we comprehensively examine combinations of controller and adversary attributes, providing streamlined proofs grounded in a unified methodology. We also offer counterexamples for settings in which a DPP with full generality is absent.

209:00 — Regularization for Adversarial Robust Learning

  • Jie Wang, Georgia Institute of Technology
  • Rui Gao, University of Texas at Austin
  • Xie Yao, Georgia Institute of Technology

Despite the growing prevalence of artificial neural networks in real-world applications, their vulnerability to adversarial attacks remains to be a significant concern, which motivates us to investigate the robustness of machine learning models. While various heuristics aim to optimize the distributionally robust risk using the $\infty$-Wasserstein metric, such a notion of robustness frequently encounters computation intractability. To tackle the computational challenge, we develop a novel approach to adversarial training that integrates f-divergence regularization into the distributionally robust risk function. This regularization brings a notable improvement in computation compared with the original formulation. We develop stochastic gradient methods with biased gradient oracles to solve this problem efficiently. Theoretical convergence analysis reveals that this algorithm has near-optimal sample complexity. Moreover, we establish the regularization effects and demonstrate this formulation is asymptotic equivalence to a regularized empirical risk minimization (ERM) framework, by considering various scaling regimes of the regularization parameter $\eta$ and robustness level $\rho$. These regimes yield gradient norm regularization, variance regularization, or a smoothed gradient norm regularization that interpolates between these extremes. We numerically validate our proposed method in supervised learning and reinforcement learning applications and showcase its state-of-the-art performance against various adversarial attacks.

309:30 — Distributionally Robust Affine Markov Decision Processes

Markov Decision Processes (MDPs) serve as a powerful model for sequential decision-making under uncertainty. Despite their prominence, a key challenge in employing MDPs for large-scale planning and control applications is the significant estimation uncertainty present in the model parameters when they are estimated from limited data. To mitigate its impact, we consider a robust approach set up to find policies yielding the highest average reward achievable uniformly over any state-transition model from an uncertainty set. Our primary contribution is in exhibiting such distributionally robust MDP models for which (i) the robust value function is simply an affine function of the present state, and as a result, (ii) optimal policies can be found by solving lower-dimensional recursions without suffering the curse of dimensionality. We achieve this for state-transition and reward models with an affine dependence on the present state-action pair, and action spaces that are linearly constrained based on the present state in a naturally decomposable manner. As a consequence, these robust affine MDP models become a novel and expressive addition to the limited collection of MDP instances known to be simultaneously immune to both estimation uncertainty and dimension-related challenges.

410:00 — Enhancing Data-Driven Distributionally Robust Optimization: Jointly Incorporating First-Order Moment and Wasserstein Distance

Distribution distance and moment information are two widely used methods for characterizing distribution properties. In this research, we study the distributionally robust optimization problems where the ambiguity sets encompass both the Wasserstein distance and first-order moment information. These ambiguity sets offer a more effective approach for leveraging sample data, thereby mitigating conservativeness. To harness the advantages of the finite-sum and asymmetrical structure inherent in such problems, we propose a novel variance-reduced asymmetrical primal-dual algorithm for efficiently solving the associated dual problem. Additionally, for a specific class of problems, we exploit the regularization structure to achieve enhanced theoretical results. Through extensive numerical investigations involving three distinct problem domains (namely, machine learning, portfolio selection, and the newsvendor problem), we compare the performance of our proposed model against that of an ambiguity set based solely on the Wasserstein distance. The results show the substantial improvement in out-of-sample performance achieved by incorporating first-order information.