108:30 — From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression

Iterative algorithms like the gradient descent and its stochastic variants are widely used to train deep neural networks. Classical optimization theory operates under small-order step-sizes which guarantees convergence. In stark contrast to traditional optimization, recent empirical studies in deep learning have revealed that training deep neural networks with large-order step-sizes yields superior generalization performance. Motivated by this line of work, we conduct a comprehensive investigation into the dynamics of gradient descent using large-order constant step-sizes in the context of quadratic regression models. Within this framework, we reveal that the dynamics can be encapsulated by a specific cubic map, naturally parameterized by the step-size. Through a fine-grained bifurcation analysis concerning the step-size parameter, we delineate five distinct training phases: (1) monotonic, (2) catapult, (3) periodic, (4) chaotic, and (5) divergent, precisely demarcating the boundaries of each phase. As illustrations, we provide examples involving phase retrieval and two-layer neural networks employing quadratic activation functions and constant outer-layers, utilizing orthogonal training data. Our simulations indicate that these five phases also manifest with generic non-orthogonal data. We also empirically investigate the generalization performance when training in the various non-monotonic (and non-divergent) phases. In particular, we observe that performing an ergodic trajectory averaging stabilizes the test error in non-monotonic (and non-divergent) phases. This is joint work with Krishnakumar Balasubramanian, Promit Ghosal, and Bhavya Agrawalla.

209:00 — Optimal Low-Rank Matrix Completion: Semidefinite Relaxations and Eigenvector Disjunctions

Low-rank matrix completion consists of computing a matrix of minimal complexity that recovers a given set of observations as accurately as possible, and has numerous applications such as product recommendation. Unfortunately, existing methods for solving low-rank matrix completion are heuristics that, while highly scalable and often identifying high-quality solutions, do not possess any optimality guarantees. We reexamine matrix completion with an optimality-oriented eye, by reformulating low-rank problems as convex problems over the non-convex set of projection matrices and implementing a disjunctive branch-and-bound scheme that solves them to certifiable optimality. Further, we derive a novel and often tight class of convex relaxations by decomposing a low-rank matrix as a sum of rank-one matrices and incentivizing, via a Shor relaxation, that each two-by-two minor in each rank-one matrix has determinant zero. In numerical experiments, our new convex relaxations decrease the optimality gap by two orders of magnitude compared to existing attempts. Moreover, we showcase the performance of our disjunctive branch-and-bound scheme and demonstrate that it solves matrix completion problems over 150x150 matrices to certifiable optimality in hours, constituting an order of magnitude improvement on the state-of-the-art for certifiably optimal methods.

309:30 — On the Partial Convexification of the Low-Rank Constrained Optimization

The low-rank constrained optimization arises in various machine learning and operations research problems. It minimizes a linear objective function subject to one or multiple two-sided linear matrix inequalities and a low-rank domain set. Although the low-rank constrained optimization is, in general, NP-hard, a viable approach is to convexify the domain set (i.e., replace the domain set with its convex hull), known as “partial convexification.” Partial convexification often leads to a tractable convex relaxation; however, its solution quality lacks theoretical guarantees. To fill this gap, we first establish the necessary and sufficient condition under which the partial convexification matches the original low-rank constrained optimization. Next, we derive an upper bound on the minimum rank among all the optimal solutions of the partial convexification. We prove that the proposed rank upper bound is tight, i.e., it is the best that one could expect. To efficiently solve the partial convexification, we
develop a column generation algorithm combined with a rank-reduction algorithm. This combination ensures that the output solution satisfies the theoretical rank upper bound guarantees. Finally, our various numerical experiments validate the strength of the partial convexification and the effectiveness of the proposed algorithms.

410:00 — A Parametric Approach for Solving Convex Quadratic Optimization with Indicators Over Trees

In this talk, we discuss convex quadratic optimization problems involving indicator variables, each associated with a continuous variable, particularly focusing on scenarios where the matrix defining the quadratic term is positive definite and its sparsity pattern corresponds to the adjacency matrix of a tree graph. We introduce a graph-based dynamic programming algorithm that solves this problem in quadratic time and memory. Central to our algorithm is a precise parametric characterization of the cost function across various nodes of the graph corresponding to distinct variables. Our computational experiments conducted on both synthetic and real-world datasets demonstrate the superior performance of our proposed algorithm compared to existing algorithms and state-of-the-art mixed-integer optimization solvers. An important application of our algorithm is in the real-time inference of Gaussian hidden Markov models from data affected by outlier noise. Using a real on-body accelerometer dataset, we solve instances of this problem with over 30,000 variables in under a minute, and its online variant within milliseconds on a standard computer.