1 — 14:00 — AdaBB: A Parameter-Free Gradient Method for Convex Optimization
We propose AdaBB, an adaptive gradient method based on the Barzilai-Borwein stepsize. The algorithm is line-search-free and parameter-free, and essentially provides a convergent variant of the Barzilai-Borwein method for general unconstrained convex optimization. We analyze the ergodic convergence of the objective function value and the convergence of the iterates for solving general unconstrained convex optimization. Compared with existing works along this line of research, our algorithm gives the best lower bounds on the stepsize and the average of the stepsizes. Moreover, we present an extension of the proposed algorithm for solving composite optimization where the objective function is the summation of a smooth function and a nonsmooth function. Our numerical results also demonstrate very promising potential of the proposed algorithms on some representative examples.
2 — 14:30 — Kernel Learning in Ridge Regression “Automatically” Yields Exact Low Rank Solution
This talk considers a variant of kernel ridge regression which simultaneously optimizes the prediction function and the reproducing kernel Hilbert space (RKHS) parameter. We identify a previously unnoticed phenomenon for this kernel learning objective where — under certain conditions — the global minimizer is exactly low rank with high probability. This phenomenon is interesting because the low rankness property is achieved without using any explicit regularization. Our theory makes correspondence between the observed phenomenon and the notion of low rank set identifiability from the optimization literature. The low rankness property of the finite sample solutions exists because the population kernel learning objective grows “sharply” when moving away from its minimizers in any direction perpendicular to the central mean subspace.
3 — 15:00 — Overparameterized Tensor Regression via Riemannian Optimization
We study the tensor regression, where the goal is to connect scalar or tensor responses to tensor covariates with a low Tucker rank parameter tensor/matrix without the prior knowledge of its intrinsic rank. We propose the Riemannian gradient descent (RGD) and Riemannian Gauss-Newton (RGN) methods and cope with the challenge of unknown rank by studying the effect of rank over-parameterization. We provide the first convergence guarantee for the general tensor regression by showing that RGD and RGN respectively converge linearly and quadratically to a statistically optimal estimate in both rank correctly-parameterized and over-parameterized settings. Our theory reveals an intriguing phenomenon: Riemannian optimization methods naturally adapt to over-parameterization without modifications to their implementation. We also prove the statistical-computational gap in tensor regression by a low-degree polynomial argument. Our theory demonstrates a ``blessing of statistical-computational gap" phenomenon in a wide range of scenarios in tensor regression for tensors of order three or higher. This shows moderate rank over-parameterization is essentially ``cost-free" in terms of sample size in tensor regression of order three or higher. Finally, we conduct simulation studies to show the advantages of our proposed methods and to corroborate our theoretical findings.
4 — 15:30 — The Burer-Monteiro SDP method can fail even above the Barvinok-Pataki bound
The most widely used technique for solving large-scale semidefinite programs (SDPs) in practice is the non-convex Burer-Monteiro method, which explicitly maintains a low-rank SDP solution for memory efficiency. When the maximum allowed rank p of the SDP solution is above the Barvinok-Pataki bound (p >= sqrt(2n), where a globally optimal solution of rank at most p is guaranteed to exist), a recent line of work established convergence to a global optimum for generic or smoothed instances of the problem. However, it was open whether there exists any instance in this regime where this method fails. We provide a family of instances on n vertices that have spurious local minima even when the rank p = n/2. This justifies the use of beyond worst-case paradigms like smoothed analysis to obtain guarantees for the Burer-Monteiro method. Based on joint work with Liam O'Carroll and Aravindan Vijayaraghavan.