108:30 — Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning

The widely used stochastic gradient methods for minimizing nonconvex composite objective functions require the Lipschitz smoothness of the differentiable part. But the requirement does not hold true for problem classes including quadratic inverse problems and training neural networks. To address this issue, we investigate a family of stochastic Bregman proximal gradient (SBPG) methods, which only require smooth adaptivity of the differentiable part. SBPG replaces the upper quadratic approximation used in SGD with the Bregman proximity measure, resulting in a better approximation model that captures the non-Lipschitz gradients of the nonconvex objective. We formulate the vanilla SBPG and establish its convergence properties under nonconvex setting without finite-sum structure. Experimental results on quadratic inverse problems testify the robustness of SBPG. Moreover, we propose a momentum-based version of SBPG (MSBPG) and prove it has improved convergence properties. We apply MSBPG to the training of deep neural networks with a polynomial kernel function, which ensures the smooth adaptivity of the loss function. Experimental results on representative benchmarks demonstrate the effectiveness and robustness of MSBPG in training neural networks. Since the additional computation cost of MSBPG compared with SGD is negligible in large-scale optimization, MSBPG can potentially be employed as an universal open-source optimizer in the future.

209:00 — Approximate Bregman Proximal Gradient Algorithm for Relatively Smooth Nonconvex Optimization

In applications to machine learning and signal processing, regularization is useful to avoid overfitting or to impose the model structure. An optimization problem with a regularization term can be formulated as a composite nonconvex optimization problem. To address the composite nonconvex optimization problem, we propose the approximate Bregman proximal gradient algorithm (ABPG). ABPG employs a new distance that approximates the Bregman distance, making the subproblem of ABPG simpler to solve compared to that of existing Bregman-type algorithms. The subproblem of ABPG often has a closed-form solution due to the approximate Bregman distance. Similarly to existing Bregman-type algorithms, ABPG does not require the global Lipschitz continuity for the gradient of the smooth part. Instead, assuming the smooth adaptable property, we establish the global subsequential convergence under standard assumptions. Additionally, assuming that the objective function is a Kurdyka–Lojasiewicz function, we prove the global convergence of ABPG for a special case. In numerical experiments, we have applied ABPG to the convex lp regularized least squares problem, the lp loss problem, and the nonnegative linear system. Our numerical results show that ABPG outperforms existing algorithms especially when the gradient of the smooth part is not globally Lipschitz or even local Lipschitz continuous.

309:30 — Bregman Proximal Methods for Quantum Information Theoretic Problems

Convex optimization problems arise naturally when studying the fundamental limits of quantum communication systems, such as computing quantities known as quantum channel capacities. These problems typically involve minimizing or maximizing functions composed of quantum entropies, which are complex nonlinear functions of matrices, over the set of unit-trace Hermitian matrices. Standard methods such as projected gradient descent do not work well for these problems as the gradients of quantum entropies are not well-defined on the boundary of the set of unit-trace Hermitian matrices (i.e., rank-deficient matrices). Instead, in this work, we show how non-Euclidean generalizations of gradient descent algorithms, known as Bregman proximal gradient methods, can be applied to efficiently solve many problems from quantum information theory. We show how mirror descent and primal-dual hybrid gradient are appropriate algorithms for computing many quantum information theoretic quantities. Additionally, we prove for these problems that mirror descent and primal-dual hybrid gradient are guaranteed to converge sublinearly or linearly by using convergence analysis based on relative smoothness and strong convexity properties of the objective functions. We present numerical experiments showing how these algorithms perform for computing quantum channel capacities, quantum rate-distortion functions, and relative entropies of entanglement.