1 — 08:30 — Black-Box Optimization Algorithms for Problems with Convex Regularizers
In this talk, we present a derivative-free trust region framework for problems which are the sum of a smooth nonconvex black-box function and a non-smooth convex regularizer. We assume the regularizer is known and has an easy-to-evaluate proximal operator. Our method, with global convergence and worst-case complexity guarantees, is more practical than existing methods for these problems as it allows inexact calculations of stationary measures (which do not have a closed-form expression). When specialized to the case of regularized nonlinear least-squares problems, we provide implementable procedures for computing inexact stationary measures and achieving the required trust-region subproblem accuracy. We show numerical results for nonlinear least-squares with L1-norm regularization.
2 — 09:00 — Zeroth-order low-rank Hessian estimation via matrix recovery
A zeroth-order Hessian estimator aims to recover the Hessian matrix of an objective function at any given point, using as few finite-difference computations as possible. We study zeroth-order Hessian estimation for low-rank Hessians, from a matrix recovery perspective. Our challenge lies in the fact that traditional matrix recovery techniques are not directly suitable for our scenario. They either demand incoherence assumptions (or its variants), or require an impractical number of finite-difference computations in our setting. To overcome these hurdles, we employ zeroth-order Hessian estimations aligned with proper matrix measurements, and prove new recovery guarantees for these estimators. More specifically, we prove that for a Hessian matrix $H \in \mathbb{R}^{n \times n}$ of rank $r$, $ \mathcal{O}(nr^2 \log^2 n ) $ proper zeroth-order finite-difference computations ensure a highly probable exact recovery of $H$. Compared to existing methods, our method can greatly reduce the number of finite-difference computations, and does not require any incoherence assumptions.
3 — 09:30 — ** MOVED TO FRIDAY AM REMOTE SESSION ** Derivative-free implementation of the regularized Newton method with lazy approximated Hessians
In this work, we develop a zeroth-order (derivative-free) implementation of the cubically regularized Newton method for solving general non-convex optimization problems. Specifically, we consider the class of unconstrained problems whose objective functions are twice continuously differentiable with Lipschitz continuous Hessians. The proposed method employs finite difference approximations of gradient vectors and Hessian matrices. We use a special adaptive search procedure in our algorithms, which simultaneously fits both the regularization parameters of the models and the stepsizes of the finite difference approximations. Thanks to this procedure our scheme does not require the knowledge of the actual Lipschitz constants. Additionally, we equip our algorithm with the lazy Hessian update, meaning that the method reuses a previously computed Hessian approximation matrix for several iterations. In terms of worst-case complexity, we prove that the proposed method is able to find an $\epsilon$-approximate stationary point using at most $\mathcal{O}( n^{3/2} \epsilon^{-3/2} )$ function evaluations, where $n$ is the dimension of the problem. This complexity bound improves existing bounds in terms of the joint dependence on $n$ and $\epsilon$ for derivative-free methods applied to the minimization of functions with Lipschitz continuous Hessians. Illustrative numerical results confirm our theoretical findings.
4 — 10:00 — On the connection between affine-invariance, convergence and learning second order information
Invariance is a general concept that is fundamental in many domains. In statistics, machine learning, decisions taken as a result of an algorithm based on data should not be affected by simple transformations on the input data like reordering or translation (as long as the changes on the data are reflected in the outcome of the decision). In optimization, the performance of an algorithm on a function should not be greatly affected if this function is translated, scaled or even transformed by an affine transformation of the search space. Invariance is particularly important as it guarantees generalization of results observed from a dataset, from an experiment, from an optimization to a wider class of problems induced by the invariance property. In numerical optimization, affine-invariance in the search space is the ultimate invariance that implies all other classical invariances.
In this presentation, after motivating invariance, I will discuss the link between affine-invariance, convergence and learning of second order information using two classes of algorithms: on the one-hand quasi-Newton algorithms and more recent derivative-free stochastic algorithms like variants of the CMA-ES. For this latter example, the algorithm learns the inverse Hessian on strictly convex-quadratic functions in a comparison-based setting, i.e. without use of derivative nor of function value.