116:20 — Deep neural network initialisation: Nonlinear activations impact on the Gaussian process

Randomly initialised deep neural networks are known to generate a Gaussian process for their pre-activation intermediate layers. We will review this line of research with extensions to deep networks having structured random entries such as block-sparse or low-rank weight matrices. We will then discuss how the choice of nonlinear activations impacts the evolution of the Gaussian process. Specifically we will discuss why sparsifying nonlinear activations such as soft thresholding are unstable, we will show conditions to overcome such issues, and we will show how non-sparsifying activations can be improved to be more stable when acting on a data manifold.

This work is joint with Michael Murray (UCLA), Vinayak Abrol (IIIT Delhi), Ilan Price (DeepMind), and from Oxford: Alireza Naderi, Thiziri Nait Saada, Nicholas Daultry Ball, Adam C. Jones, and Samuel C.H. Lam.

216:50 — Functionally-Identical Pruning of Tree Ensembles

We study how to prune a tree ensemble into a reduced one that is ``functionally identical'' to the original model. That is, we ensure that its prediction function remains unchanged for any input in a domain of interest. Consequently, this pruning algorithm is also lossless for any aggregated metric. At its core, our algorithm (i) efficiently solves a mixed integer linear program for ensemble compression, (ii) iteratively detects samples for which prediction differs using a separation oracle, and (iii) dynamically adds constraints to guarantee functional identity. We evaluate our method for different domains of interest: discrete spaces containing the training set, or a subset thereof, and a continuous space characterizing plausible inputs. In an extensive computational campaign, we show that this approach gives a ``free lunch'', as it can reduce ensemble size without changing model functionality. Our approach is general and can be extended to other ensemble learners for which an adversarial MILP can be formulated.

317:20 — Discrete Optimization Methods for Pruning and Compressing Large Neural Networks

Foundation models have achieved remarkable performance across various domains, but their large model sizes lead to high computational costs (storage, inference latency, memory, etc). NN pruning, roughly categorized as unstructured (removing individual weights) and structured (removing entire channels, neurons, or heads in attention layers), aims to reduce these costs by removing less-important parameters while retaining model utility as much as possible. Structured pruning is a practical way to improve inference latency on standard hardware (e.g., TPUs) in contrast to unstructured pruning, requiring specialized hardware and software. We propose several novel algorithmic ideas to improve the efficiency of large vision and language models (on standard hardware) via structured pruning.