1 — 14:00 — Reliable Optimization Algorithms when Objective Function Evaluations are Expensive
Gradient methods are experiencing a growth in methodological and theoretical developments owing to the challenges of optimization problems arising in data science. Focusing on data science applications with expensive objective function evaluations yet inexpensive gradient function evaluations (e.g., identification of conservative estimating equation models), gradient methods that never make objective function evaluations are either being rejuvenated or actively developed. However, as we show, such gradient methods are all susceptible to catastrophic divergence under realistic conditions for data science applications. In light of this, gradient methods which make use of objective function evaluations become more appealing, yet, as we show, can result in an exponential increase in objective evaluations between accepted iterates. As a result, existing gradient methods are poorly suited to the needs of optimization problems arising from data science. In this talk, we present a novel, generic methodology that economically uses objective function evaluations in a problem-driven manner to prevent catastrophic divergence and avoid an explosion in objective evaluations between accepted iterates. Our methodology allows for specific procedures that can make use of specific step size selection methodologies or search direction strategies, and we develop a novel step size selection methodology that is well-suited to data science applications. We show that a procedure resulting from our methodology is highly competitive with standard optimization methods on CUTEst test problems. We then show a procedure resulting from our methodology is highly favorable relative to standard optimization methods on optimization problems arising in our target data science applications. Thus, we provide a novel gradient methodology that is better suited to optimization problems arising in data science.
2 — 14:30 — Yet another fast variant of Newton's method for nonconvex optimization
A second-order algorithm is proposed for minimizing any smooth nonconvex function that suitably alternates between regularized Newton step and negative curvature steps. At most iterations, the Hessian matrix is regularized with the square root of the norm of the current gradient and an additional term taking moderate negative curvature into account, while a negative curvature step is taken only exceptionally otherwise. As a consequence, the proposed method has the feature that it only requires the solution of a single linear system at nearly all iterations., making is ideally suited to applications where of-the-shelf direct or preconditioned iterative linear solvers are available. We establish that , under standard assumptions, at most $O(\log(epsilon)|epsilon^{-3/2})$ evaluations of the problem's objective function , gradient and Hessian are needed for this algorithm to obtain an epsilon-approximate first-order critical point, and at most $O(|\log(epsilon)| epsilon^{-3})$ to obtain an epsilon-approximate second-order one. Initial numerical experiments with two variants of the new method are finally presented. The first variant uses direct linear solvers and the second uses subspace-based ones, like truncated conjugate-gradients. The performance of these variants on a set of standard test problems is shown to be competitive.
3 — 15:00 — Approximate Derivatives for Tensor Methods
It is well known that second-order methods converge much faster than first-order methods. Based on the theory one should expect another speed-up (in terms of function evaluations) when incorporating third-order tensor information. Third-order oracles, however, are almost never available. We therefore propose a generalization of the quasi-Newton approach to higher-order methods. Just like quasi-Newton methods extract second-order information from repeated gradient evaluations, this method extracts third-order information from repeated Hessian evaluations. In each step the new approximation is one that satisfies a higher-order analogue of the secant equation and simultaneously is closest possible to the previous approximation in a weighted Frobenius norm. It can be shown that such approximations recover the true derivative in the limit under certain assumptions and that, in this case, tensor methods using these approximations achieve super-quadratic local convergence. We present small-scale numerical experiments that show the advantages and disadvantages of using tensor methods over second-order models in practice. For this we focus on the ARp method which uses a regularized Taylor expansion as a local model and is able to incorporate both second- and third-order methods in the same framework.
4 — 15:30 — Deterministic and Stochastic Frank Wolfe Recursion on Probability Spaces
Motivated by emergency response applications, we consider smooth optimization problems over the space
of non-negative Borel measures supported on a convex compact set $X \subset Rd$. Through direct application of
variational calculus, we devise a deterministic Frank-Wolfe (DFW) first-order recursion using the influence
function, to generate iterates exhibiting $O(\epsilon-1)$ convergence in function value. Importantly, the recursion
we present is made possible by a simple result that expresses the solution to the infinite-dimensional Frank-
Wolfe sub-problem as the solution to a finite-dimensional optimization problem in Rd. Such a result allows
writing the Frank-Wolfe recursion as a convex combination of the incumbent iterate and a Dirac measure
concentrating on the minimum of the influence function at the incumbent iterate. To account for com-
mon application contexts where only Monte Carlo observations of the objective and influence function are
available, we construct the stochastic Frank-Wolfe (SFW) variation that generates a sequence of random
measures constructed using minima of increasingly accurate estimates of the influence function. We demon-
strate that the SFW sequence exhibits $O(1/k)$ complexity almost surely and in expectation. Furthermore, we
show that an easy-to-implement fixed-step, fixed-sample version of SFW exhibits exponential convergence
to ε-optimality. We end with a central limit theorem on the observed objective values at the sequence of
generated random measures. We include a number of illustrative example problem classes along with exact
calculations of the influence function.