114:00 — Convergence analysis of CMA-ES

The Covariance Matrix Adaptation Evolution Strategies (CMA-ES) is a black-box derivative-free stochastic optimization method that has been widely applied across various domains.
Despite empirical evidence of its linear convergence to the solution of many optimization problems, including the minimization of nonsmooth nonconvex multimodal ill-conditioned functions, a mathematical proof was lacking.
I will present recent advancements that aim to fill this gap. By normalizing the states of CMA-ES while minimizing a spherical function (monotonously transformed Euclidean norm), we define a time-homogeneous Markov chain that---when setting correctly the hyperparameters of CMA-ES---converges at a geometric rate towards a stationary probability distribution. A law of large numbers concludes our convergence proof and provides a convergence rate for spherical objective functions. Due to invariance properties of the algorithm, this result applies as well to ellipsoid (monotonously transformed convex-quadratic) objective functions, for which the covariance matrix learns second-order information through the approximation of the inverse Hessian.

214:30 — Q-fully quadratic modeling and its application in a random subspace derivative-free method

Model-based derivative-free optimization (DFO) methods are an important class of DFO methods that are known to struggle with solving high-dimensional optimization problems. Recent research has shown that incorporating random subspaces into model-based DFO methods has the potential to improve their performance on high-dimensional problems. However, most of the current theoretical and practical results are based on linear approximation models due to the complexity of quadratic approximation models. In this talk, we propose a random subspace derivative-free trust-region algorithm based on quadratic approximations. Unlike most of its precursors, this algorithm does not require any special form of objective function. We study the geometry of sample sets, the error bounds for approximations, and the quality of subspaces. In particular, we provide a technique to construct Q-fully quadratic models, which is easy to analyze and implement. We present an almost-sure global convergence result of our algorithm and give an upper bound on the expected number of iterations to find a sufficiently small gradient. We also develop numerical experiments to compare the performance of our algorithm using both linear and quadratic approximation models. The numerical results demonstrate the strengths and weaknesses of using quadratic approximations.

315:00 — A new two-dimensional model-based subspace method for large-scale unconstrained derivative-free optimization: 2D-MoSub

  • Pengcheng Xie, Chinese Academy Of Sciences
  • Ya Xiang Yuan, Chinese Academy Of Sciences
  • Yi Zhang, Institute of Computational Mathematics And Scientific/Engineering Computing

Derivative-free optimization methods have gained attention due to their applicability in scenarios where derivative information is either unavailable or expensive to compute. For existing techniques for model-based derivative-free optimization, the large-scale problems are still bottlenecks. One important approach is the use of subspace methods, which involves minimizing the objective function in a low-dimensional subspace at each iteration to obtain the next iteration point. This talk will introduce our method 2D-MoSub (2-dimensional model-based subspace method), which is a novel derivative-free optimization method based on the subspace method for general unconstrained optimization and especially aims to solve large-scale derivative-free optimization problems. 2D-MoSub combines 2-dimensional quadratic interpolation models and trust-region techniques to iteratively update the points and explore the 2-dimensional subspace. 2D-MoSub's framework includes initialization, constructing the interpolation set, building the quadratic interpolation model, performing trust-region trial steps, and updating the trust-region radius and subspace. This talk will introduce the framework and computational details of 2D-MoSub, and discuss the poisedness and quality of the interpolation set on the corresponding 2-dimensional subspace. This talk will also analyze some properties of 2D-MoSub, including the model's approximation error with projection property and the algorithm's convergence. Numerical results demonstrate the effectiveness and efficiency of 2D-MoSub for solving a variety of unconstrained optimization problems.

415:30 — Noise-aware derivative-free optimization

In this work, we develop and analyze a noise-aware model-based trust-region (MBTR) method for derivative-free optimization . By a noise-aware algorithm, we refer to any algorithm that takes (a dynamic) estimate of noise present in blackbox function values. While we intend for the definition of noise to be fairly encompassing, the theory on which we build especially considers cases where the noise can be bounded in absolute value, or where the noise is generated from some distribution exhibiting a subexponential tail. Our algorithm builds on standard MBTR methods. However, guided by theoretical convergence results, our method explicitly decouples the trust-region radius parameter from the model-building process, and we additionally pay far more attention to the geometry of interpolation sets used in model-building than do standard implementations of MBTR methods. Our convergence results provide high-probability rates to neighborhoods of local optima, and an implementation of our method does quite well compared to derivative-free methods that are not explicitly noise-aware. This work was particularly motivated by variational quantum algorithms (VQAs), a paradigm of hybrid quantum-classical computing, in which measurements acquired on a quantum device are effectively treated as a blackbox function. In the VQA setting, the noise present in these blackbox function values is an aggregation of both stochastic noise arising from probabilistic realizations of state vectors, as well as non-negligible hardware noise from near-term quantum devices that is generally independent of the optimization process. We demonstrate the utility of our novel method on optimization problems that arise in VQAs.