114:00 — First-order methods for bilevel optimization

Bilevel optimization, also known as two-level optimization, is an important branch within mathematical optimization. It has found applications across various domains, including economics, logistics, supply chain, transportation, engineering design, and machine learning. In this talk, we will present first-order methods for solving a class of bilevel optimization problems using either single or sequential minimax optimization schemes. We will also discuss the first-order operation complexity of these methods and present preliminary numerical results to illustrate their performance.

214:30 — Classification under strategic adversary manipulation using pessimistic bilevel optimisation

Adversarial machine learning concerns the situations where learners face attacks from active adversaries. In particular, the underlying distribution of the data is vulnerable to significant changes made by the adversary. We consider the case of test-time attacks, where such changes to the data occur in response to trained prediction models. Under these attacks, the accuracy of the prediction model at time of implementation is susceptible to significant decreases in accuracy. Such scenarios arise in applications such as spam email filtering, malware detection and fake-image generation. We model these interactions between the learner and the adversary as a game. While many game theoretic models exist, some of these assume that the players act simultaneously. However, in the case of adversarial learning, a perhaps more appropriate assumption is that players can observe their opponent's actions before making their own. For example, spam email senders might probe an email filter by sending test emails before deploying their final products. As such, we model the interaction between the learner and adversary as a Stackelberg game, with the learner taking the role of the leader. The adversary, modelled as a data generator, takes the role of the follower, constructing their generator in response to the learner's prediction function. The game is formulated as a pessimistic bilevel model to realistically capture the antagonistic nature of adversarial learning scenarios. While existing pessimistic models rely on assumptions about the uniqueness of the lower-level solution, we will present our model and solution method which do not make such assumptions.

315:00 — A Banach-space view of neural network training

In this talk, we consider the problem of training neural networks with common regularization schemes such as weight decay (which corresponds to the standard Tikhonov regularization). In the case of shallow neural networks, it turns out that this non-convex optimization problem can be lifted to a convex optimization problem posed over a particular Banach space. Our main result is a theorem that states that every solution to the non-convex neural network training problem is a solution to the optimization problem posed over this Banach space. In particular, the solution set to the Banach-space problem is nonempty, convex, and weak* compact and its extreme points are finite-width neural networks whose widths are bounded by the number of data in the original training objective. This Banach space does not coincide with classical function spaces studied in analysis as it is defined via a measure of smoothness in the domain of the Radon transform of the underlying functions. We will discuss several implications of this correspondence between neural network training problems and Banach-space optimization problems towards the understanding of the impressive performance seen by neural networks. These results could also motivate the design of new optimization algorithms for training neural networks. If time permits, we will discuss extensions of these results to other architectures.