### 1 — 14:00 — Learning to solve large-scale districting problems

Districting is a complex combinatorial problem that consists in partitioning a geographical area into small districts. In logistics, it is a major strategic decision determining operating costs for several years. Solving them using traditional methods is intractable even for small geographical areas and existing heuristics, while quick, often provide sub-optimal results. We present a structured learning approach to find high-quality solutions to real-world districting problems in a few minutes. It is based on integrating a combinatorial optimization layer, the capacitated minimum spanning tree problem, into a graph neural network architecture. To train this pipeline in a decision-aware fashion, we show how to construct target solutions embedded in a suitable space and learn from target solutions. Experiments show that our approach outperforms existing methods and can reduce costs by up to $15\%$ on real-world cities.

### 2 — 14:30 — Surrogate policies with generalization guarantees for combinatorial optimization problems

Consider a combinatorial optimization problem whose objective is given by a moderately expensive oracle that models complex phenomena. Operations Research practitioners generally ignore distributional information over the instances. To be practical, an algorithm must therefore scale well on any large instance of the problem. Practitioners usually solve a surrogate problem with an oversimplified objective that is amenable to classic combinatorial optimization techniques. We take instead a learning perspective. We assume that we have access to a training set of instances coming from an unknown distribution. And we restrict ourselves to algorithms that chain a neural network with a linear combinatorial optimization oracle. The latter is used as a tool to explore the combinatorially large set of feasible solutions efficiently. We use the training set to learn the parameters of the neural network that leads to the best performance. Minimizing the empirical risk on such a model class is difficult as the risk is piecewise constant. We therefore introduce a perturbation approach, that makes the empirical risk minimization problem tractable using exact algorithms. This allows us to derive an upper-bound on the non-optimality of the learned. This bound vanishes in the large training set regime. It is based on an ad-hoc statistical learning tools that exploit the geometry implied by the linear combinatorial optimization layer and the perturbation.

### 3 — 15:00 — Neur2BiLO: Neural Bilevel Optimization

Bilevel optimization deals with nested problems in which a leader takes the first decision to minimize their objective function while accounting for a follower best-response reaction. Constrained bilevel problems with integer variables are particularly notorious for their hardness. While exact solvers have been proposed for mixed-integer linear bilevel optimization, they tend to scale poorly with problem size and are hard to generalize to the non-linear case. On the other hand, problem-specific algorithms (exact and heuristic) are limited in scope. Under a data-driven setting in which similar instances of a bilevel problem are solved routinely, our proposed framework, Neur2BiLO, embeds a neural network approximation of the leader's or follower's value function, trained via supervised regression, into an easy-to-solve mixed-integer program. Neur2BiLO serves as a heuristic that produces high-quality solutions extremely fast for a variety of bilevel problems spanning linear/non-linear, integer/mixed-integer knapsack problems, and four application domains.

### 4 — 15:30 — From Large to Small Datasets: Size Generalization for Clustering Algorithm Selection

In clustering algorithm selection, we are given a massive dataset and must efficiently select which clustering algorithm to use. We study this problem in a semi-supervised setting, with an unknown ground-truth clustering that we can only access through expensive oracle queries. Ideally, the clustering algorithm's output will be structurally close to the ground truth. We approach this problem by introducing a notion of size generalization for clustering algorithm accuracy. We identify conditions under which we can (1) subsample the massive clustering instance, (2) evaluate a set of candidate algorithms on the smaller instance, and (3) guarantee that the algorithm with the best accuracy on the small instance will have the best accuracy on the original big instance. We verify these findings both theoretically and empirically.

This is joint work with Vaggos Chatziafratis and Ishani Karmarkar.