1 — 14:00 — EM Algorithms for Optimization Problems with Polynomial Objectives
EM (Expectation-Maximization) algorithm is a popular algorithm frequently used in maximum likelihood estimation of statistical models, especially when they include latent random variables. Focusing on its MM (Majorization-Minimization) optimization features, this paper presents a new algorithmic scheme based on EM algorithm for solving general (not necessarily statistical) optimization problems by introducing certain probability distributions. In particular, we demonstrate that several optimization problems with polynomial objectives, which include linear program (LP) and quadratic program (QP), can be embedded in the EM scheme with an adequate choice of a probability distribution from the exponential family. Specifically, by employing a normal distribution, the EM algorithm for the unconstrained minimization of a convex quadratic objective turns out to be a natural gradient descent without line search. For minimizing a polynomial objective over a rectangle or a simplex, we show that the EM algorithm with a binomial or multinomial distribution, respectively, becomes an interior point algorithm which is also reduced to a natural gradient descent without line search. As further extensions to the case where a natural gradient descent without line search is no longer available, new interior point algorithms are presented for polynomial optimization over a polytope, LP, and QP, on the basis of EM and generalized EM algorithms. The presented algorithms are somewhat conceptual in that their subprocedures require solving a nonlinear convex optimization, and some practical strategies are discussed.
2 — 14:30 — Level constrained first-order methods for nonconvex function constrained optimization
We present a new feasible proximal gradient method for constrained optimization where both the objective and constraint functions are given by the summation of a smooth, possibly nonconvex function and a convex simple function. The algorithm converts the original problem into a sequence of convex subproblems. Formulating those subproblems requires the evaluation of at most one gradient value of the original objective and constraint functions. Either exact or approximate subproblem solutions can be computed efficiently in many cases. An important feature of the algorithm is the constraint level parameter. By carefully increasing this level for each subproblem, we provide a simple solution to overcome the challenge of bounding the Lagrangian multipliers and show that the algorithm follows a strictly feasible solution path till convergence to the stationary point. We develop a simple, proximal gradient descent type analysis, showing that the complexity bound of this new algorithm is comparable to gradient descent for the unconstrained setting, which is new in the literature. Exploiting this new design and analysis technique, we extend our algorithms to some more challenging constrained optimization problems where 1) the objective is a stochastic or finite-sum function, and 2) structured nonsmooth functions replace smooth components of both objective and constraint functions. Complexity results for these problems also seem to be new in the literature. Finally, our method can also be applied to convex function-constrained problems where we show complexities similar to the proximal gradient method.
3 — 15:00 — ** CANCELLED ** Methods of Difference-of-Convex Programming for Statistical Learning with Structured Sparsity
We consider a statistical learning problem where the model aims to implement pre-determined hierarchical relationships among the variables as an additional criterion in the training process. The required logical conditions are formulated exactly as the affine sparsity constraints (ASCs), then approximated as a difference-of-convex (DC) constrained program by employing existing sparsity-promoting functions. This presentation introduces numerical methods that are designed to solve variants of the latter DC optimization problem. We present the convergence analysis of the methods and numerically demonstrate the effectiveness of the proposed methods on selected statistical learning problems with logical conditions, including regression with interaction effects and tree-structured hierarchical selection. Our numerical results show that the ASC approximations can enforce the required logical conditions in the model while achieving desirable computational efficiency and model accuracy compared to existing methods.
4 — 15:30 — GPU Implementation of Algorithm NCL
For constrained optimization problems, LANCELOT solves a short sequence
of subproblems that minimize an augmented Lagrangian subject to bounds
(and are immune to LICQ difficulties). Algorithm NCL solves equivalent
subproblems that are suited to nonlinear interior methods. We focus on
solving the associated KKT systems. The NCL transformation permits
reduction to smaller systems that can be solved by Cholesky-type factorizations.
Our NCL implementation is based on MadNLP.jl (a nonlinear optimization solver
working on GPU) and CUDSS.jl (a Julia interface to the NVIDIA library cuDSS).
We describe the reduction to smaller systems and present numerical experiments.