ISMP 2024

1 — 08:30 — Riemannian trust-region methods for strict saddle functions with complexity guarantees

Florentin Goyens, University Paris Dauphine Psl
Clément Royer, Université Paris Dauphine Psl

The difficulty of minimizing a nonconvex function is in part explained by the presence of saddle points, which slows down the algorithm. This also directly impacts the worst-case complexity guarantees. However, a number of nonconvex problems of
interest possess a favorable structure for optimization, in the sense that saddle points can be escaped efficiently by an appropriate algorithm. This so-called strict saddle property has been extensively
used in data science to derive good properties for first-order algorithms, such as convergence to
second-order critical points. However, the analysis and the design of second-order algorithms in the strict
saddle setting have received significantly less attention.

We consider second-order trust-region methods for a class of strict saddle functions
defined on Riemannian manifolds. These functions exhibit (geodesic) strong convexity around minimizers
and negative curvature at saddle points. We show that the standard trust-region method with exact
subproblem minimization finds an approximate local minimizer in a number of iterations that depends
logarithmically on the accuracy parameter, which significantly improves known results for general
nonconvex optimization. We also propose an inexact variant of the algorithm that explicitly leverages the
strict saddle property to compute the most appropriate step at every iteration. Our bounds for the inexact
variant also improve over the general nonconvex case, and illustrate the benefit of using strict saddle
properties within optimization algorithms.

Keywords: Riemannian optimization, strict saddle function, second-order method, complexity guarantees.

2 — 09:00 — A stochastic Hessian-free trust-region method for statistical learning

Jeremy Rieussec, University of Montreal
Fabian Bastin, Université De Montréal
Jean Laprès Chartrand, Université De Montréal

We investigate an adaptive Hessian-free trust-region method for statistical learning. A sampling strategy based on dependant random variables is designed to reduce the sample size at each iteration as much as possible while conserving, in this stochastic setting, the trust-region convergence properties to first-order, and potentially second-order, critical points which are well established for the deterministic case.

Taking advantage of the outer product Hessian approximation with truncated conjugate gradient, this technique is applicable in maximum likelihood estimation as well as in non-linear least squares regression. We illustrate the potential gains in computation time and memory consumption on various numerical experiments.

3 — 09:30 — Implementation and experiments of an augmented Lagrangian method with first-order information

Ernesto Birgin, University of São Paulo

Algencan is a safeguarded augmented Lagrangian method for minimization with general nonlinear constraints. At each iteration, a bound-constrained subproblem is solved approximately. The approach used for the minimization with bound constraints is an active set strategy. A projected spectral gradient step is used to leave the faces. To minimize inside the faces, essentially any unconstrained minimization method can be used. Currently, taking advantage of the modular structure of augmented Lagrangian methods, we are revisiting first-order matrix-free methods for the minimization inside the faces. The current implementation of Algencan, when second-order information is not available, uses inside the faces truncated Newton directions in which the Newtonian system is solved with conjugate gradients. Inspired by recent work, we analyzed the possibility of using MINRES. Driven by a relatively recent comparison of solvers for bound-constrained minimization, we also considered the possibility of using nonlinear conjugate gradients. The work is in progress and in this talk we show the numerical results we have obtained so far.

4 — 10:00 — Spectral Stochastic Gradient Method with Additional Sampling

Ilaria Trombini, Università Degli Studi Di Ferrara
Valeria Ruggiero, Università Degli Studi Di Ferrara
Natasa Krejic, University of Novi Sad

In this talk, we propose a new stochastic gradient method for numerical minimization of finite sums. We also propose a modified version of this method applicable on more general problems referred to as infinite sum problems, where the objective function is in the form of mathematical expectation. The method is based on a strategy to exploit the effectiveness of the well-known Barzilai-Borwein (BB) rules [1] or variants of these (BB-like rules) for updating the step length in the standard gradient method. Very effective improvements with respect to the standard BB rules have been obtained using the Adaptive Barzilai-Borwein (ABB) strategy [2] and its modification ABBmin [3]. Starting from these observations and following the approaches developed in Algorithms LSOS [4] and SLiSeS [5], the proposed method adapts the aforementioned strategy into the stochastic framework by exploiting the same SAA estimator of the objective function for several iterations. Furthermore, the sample size is controlled by an additional sampling which also plays a role in accepting the proposed iterate point. Moreover, the number of "inner" iterations with the same sample is also controlled by an adaptive rule which prevents the method from getting stacked with the same estimator for too long. The method is described and analyzed first for the minimization of the finite sum case, giving rise to the scheme named LSNM-BB; then, it is generalized to the more general case, leading to the version named LSNM-BB-G. Convergence results are discussed for the finite and infinite sum version, for general and strongly convex objective functions. Numerical experiments on well-known datasets for binary classifications show very promising performance of the method, without the need to provide special values for hyperparameters on which the method depends. We tested on the datasets w8a, IJCNN, RCV1 and MNIST for the finite sum problem and we simulate the infinite sum problem with the dataset CIFAR10.

[1] IMA J.Numer.Anal., 8, 141–148 (1988).
[2] Comput. Optim. Appl. 35(1), 69–86 (2006).
[3] J. Ind. Manag. Optim. 4(2), 299–312 (2008).
[4] LSOS: Line-search Second-Order Stochastic optimization methods for nonconvex finite sums (2022).
[5] SLiSeS: Subsampled Line Search Spectral Gradient Method for Finite Sums (2023).

Schedule

Tuesday, July 23, 2024 08:30 - 10:30 — Parallel Session

TA59 — Advances in numerical methods for large scale nonlinear optimization and applications II

1 — 08:30 — Riemannian trust-region methods for strict saddle functions with complexity guarantees

2 — 09:00 — A stochastic Hessian-free trust-region method for statistical learning

3 — 09:30 — Implementation and experiments of an augmented Lagrangian method with first-order information

4 — 10:00 — Spectral Stochastic Gradient Method with Additional Sampling