1 — 14:00 — An abstract convergence framework with application to proximal Heavy-ball line-search algorithms
In this talk, we analyze the convergence of an abstract descent algorithm to a stationary point of a proper and lower semicontinuous function. The proposed algorithm is defined through a specific set of properties that are common to several first-order methods for nonsmooth optimization. Such properties guarantee the convergence of the whole sequence of iterates to a stationary point, if the objective function satisfies the Kurdyka-Lojasiewicz property. Unlike similar prior works, we relax two of these properties, the so-called sufficient decrease condition and the relative error condition, allowing for the design of non-monotone algorithms with an inexact iteration update step that is implementable in practice. We propose three novel proximal Heavy-ball-type algorithms that are special instances of our abstract framework. The first algorithm - “i2Piano” - adjusts the steplength by backtracking until a local approximation of the Lipschitz constant is found. The two other algorithms – “iPila” and “Phila” – enforce a descent condition on a merit function rather than the objective function itself by performing a linesearch along the descent direction. We apply the proposed algorithms on a series of image deblurring problems, assessing their efficiency and superiority with respect to other competitors available in the literature, especially when the inertial parameter is selected by mimicking the Conjugate Gradient updating rules.
2 — 14:30 — On the limit form of the Su-Boyd-Candès dynamic version of Nesterov's accelerated gradient method when the viscous parameter becomes large
In a Hilbert setting, our study concerns the dynamical system introduced by Su-Boyd-Candès as a low resolution ODE of Nesterov's accelerated gradient method and of the Ravine method.
This inertial system ${\rm (AVD)}_{\alpha}$ is driven by the gradient of the function $f$ to be minimized, and is damped by a viscous friction whose coefficient is equal to $\alpha/t$, and therefore tends to zero when $t$ tends to infinity.
Taking $\alpha$ sufficiently large plays a crucial role in the asymptotic convergence properties of the trajectories generated by this sytem.
In particular, for a general convex function $f$, the condition $\alpha >3$ guarantees the asymptotic convergence rate of the values $o \left( 1/t^2 \right)$, as well as the convergence of the trajectories towards optimal solutions.
On the basis of temporal scaling techniques, we obtain, for $ \alpha $ large, equivalent simple continuous dynamics which are autonomous and of the first order in time.
Precisely, we show that a judicious time rescaling of ${\rm (AVD)}_{\alpha}$ results in trajectories close to those of the continuous steepest descent associated with $f$. This explains the passage from
$1/t$ to $1/t^2$ of the convergence rate of values between the steepest descent method and the Nesterov accelerated gradient method (as well as the Ravine accelerated gradient method). At the same time, new results are obtained concerning the complexity analysis of the Nesterov method and of the Ravine method. Some numerical experiments are provided to verify the theoretical results.
3 — 15:00 — Inertial Accelerated Stochastic Mirror Descent for Large-Scale Generalized Tensor CP Decomposition
The majority of classic tensor CP decomposition models are designed for squared loss, employing
Euclidean distance as a local proximal term. However, the Euclidean distance is unsuitable for
the generalized loss function applicable to various types of real-world data, such as integer and
binary data. Consequently, algorithms developed under the squared loss are not easily adaptable
to handle these generalized losses, partially due to the lack of the gradient Lipschitz continuity.
This paper considers the generalized tensor CP decomposition. We use the Bregman distance as
the proximal term and propose an inertial accelerated block randomized stochastic mirror descent
algorithm (iTableSMD). Within a broader multi-block variance reduction and inertial acceleration
framework, we demonstrate the sublinear convergence rate for the subsequential sequence produced
by the iTableSMD algorithm. We further show that iTableSMD requires at most $O(\varepsilon^{-2})$ iterations
in expectation to attain an $\epsilon$-stationary point and establish the global convergence of the sequence.
Numerical experiments on real datasets demonstrate that our proposed algorithm is efficient and
achieves better performance than the existing state-of-the-art methods.
4 — 15:30 — ** CANCELLED ** Convergence of the momentum method for semialgebraic functions with locally Lipschitz gradients
We propose a new length formula that governs the iterates of the momentum method when minimizing differentiable semialgebraic functions with locally Lipschitz gradients. It enables us to establish local convergence, global convergence, and convergence to local minimizers without assuming global Lipschitz continuity of the gradient, coercivity, and a global growth condition, as is done in the literature. As a result, we provide the first convergence guarantee of the momentum method starting from arbitrary initial points when applied to matrix factorization, matrix sensing, and linear neural networks.
Our contributions are as follows. We consider objective functions that are semialgebraic and differentiable with locally Lipschitz gradients. We show that the length of the iterates generated by the momentum method is upper bounded by an expression depending on the objective function variation, the step size, and a desingularizing function. This length formula enables us to show that global Lipschitz continuity of the gradient and the global growth condition are superfluous when establishing local convergence. It also enables us to establish global convergence under the assumption that the continuous-time gradient trajectories of the objective function are bounded. As a result, we bypass the need for coercivity and globally Lipschitz gradients. Finally, the length formula enables us to guarantee convergence to local minimizers almost surely, under second-order differentiability and the strict saddle property.