1 — 08:30 — Outlier-Robust Wasserstein DRO
Distributionally robust optimization (DRO) is an effective approach for data-driven decision-making in the presence of uncertainty. Geometric uncertainty due to sampling or localized perturbations of data points is captured by Wasserstein DRO (WDRO), which seeks to learn a model that performs uniformly well over a Wasserstein ball centered around the observed data distribution. However, WDRO fails to account for non-geometric perturbations such as adversarial outliers, which can greatly distort the Wasserstein distance measurement and impede the learned model. We address this gap by proposing a novel outlier-robust WDRO framework for decision-making under both geometric (Wasserstein) perturbations and non-geometric (total variation (TV)) contamination that allows an ε-fraction of data to be arbitrarily corrupted. We design an uncertainty set using a certain robust Wasserstein ball that accounts for both perturbation types and derive minimax optimal excess risk bounds for this procedure that explicitly capture the Wasserstein and TV risks. We prove a strong duality result that enables tractable convex reformulations and efficient computation of our outlier-robust WDRO problem. When the loss function depends only on low-dimensional features of the data, we eliminate certain dimension dependencies from the risk bounds that are unavoidable in the general setting. Finally, we present experiments validating our theory on standard regression and classification tasks.
2 — 09:00 — Coherent Wasserstein Distributionally Robust Optimization
Wasserstein distributionally robust optimization (DRO), as a key paradigm in data-driven optimization, aims to provide decisions with out-of-sample performance guarantees by optimizing against a set of distributions close to the empirical one in Wasserstein $p$-distance, where $p \in [1,\infty]$. The set, known as the Wasserstein ball, becomes less conservative as $p\rightarrow \infty$, containing distributions more heavily concentrated around the data, i.e., more data-driven. Despite this attractive feature of higher order $p$, the type-1 Wasserstein distance, i.e., $p=1$, remains most popular to date. The advantages of the type-1 distance are threefold: from a modelling perspective, it accommodates richer families of distributions; computationally, it leads to Wasserstein DRO problems with favourable complexity, often solvable via linear programs; and statistically, it ensures the best rate of convergence for finite-sample guarantees. We introduce a novel alternative, coherent Wasserstein metrics, achieving flexibility comparable to type-$p$ metrics for reducing conservatism, while inheriting the advantages of the type-1 distance. These metrics are derived from risk-averse formulations of the optimal transport problem with coherent risk measures. We show that distributionally robust optimization problems using coherent Wasserstein metrics, termed coherent Wasserstein distributionally robust optimization (CWDRO), can retain key properties of type-1 Wasserstein DRO, while potentially achieving better utilization of data.
3 — 09:30 — Universal Gradient Descent Ascent Method for Smooth Minimax Optimization
Smooth minimax optimization has garnered significant attention over the past decade, owing to its widespread applications in machine learning. Considerable research has focused on tailored algorithms for smooth minimax problems, addressing specific structural conditions like convexity or concavity of primal and dual functions, as well as Polyak-\L{}ojasiewicz (P\L{}) and Kurdyka-\L{}ojasiewicz (K\L{}) conditions. However, verifying these conditions is challenging in practice, motivating our pursuit of universal algorithms for smooth minimax problems. To answer this questions, we propose a novel universally applicable single-loop algorithm, the doubly smoothed optimistic gradient descent ascent method (DS-OGDA), which can be applied to convex-concave, nonconvex-concave, convex-nonconcave, and nonconvex-nonconcave problems with one-sided K\L{} properties. With a same set of parameters, DS-OGDA can achieve convergence with $\mathcal{O}(\epsilon^{-4})$ complexity for nonconvex minimax problems and $\mathcal{O}(\epsilon^{-2})$ complexity for convex-concave minimax problems. Sharper (even optimal) iteration complexity can be achieved with known K\L{} exponent or in convex-concave scenarios. Specifically, under the one-sided K\L{} condition with exponent $\theta\in(0,1)$, DS-OGDA converges with $\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$ complexity. Additionally, DS-OGDA attains $\mathcal{O}(\epsilon^{-1})$ complexity in convex-concave settings. They all match the corresponding best results in the literature. We further demonstrate the universality of DS-OGDA through numerical experiments on a class of distributionally robust optimization problems.
4 — 10:00 — The out-of-sample prediction error of the square-root lasso and related estimators
We study the classical problem of predicting an outcome variable, Y, using a linear combination of a d-dimensional covariate vector, X. We are interested in linear predictors whose coefficients solve: $inf_{\beta} (E[(Y - < \beta, X >)^r])^{1/r} + \delta || \beta ||$, where r >1 and $\delta$ > 0 is a regularization parameter. We provide conditions under which linear predictors based on these estimators minimize the worst-case prediction error over a ball of distributions determined by a type of max-sliced Wasserstein metric. A detailed analysis of the statistical properties of this metric yields a simple recommendation for the choice of regularization parameter. The suggested order of 𝛿, after a suitable normalization of the covariates, is typically d/n, up to logarithmic factors. Our recommendation is computationally straightforward to implement, pivotal, has provable out-of-sample performance guarantees, and does not rely on sparsity assumptions about the true data generating process.