1 — 16:20 — Connections between Gradient Descent and the Power Method
Applying Gradient Descent (GD) with a fixed step size to minimize a quadratic function is equivalent to running the Power Method (PM) on the gradients. The connection between GD with a fixed step size and the PM, both with and without fixed momentum, is thus established. Consequently, valuable eigen-information is available via GD. By considering the dynamics of the gradients and iterates, new step size strategies are proposed to improve the practical performance of GD.
2 — 16:50 — A Discretization Approach for Low-Dimensional Non-Convex Bilevel Optimization
Bilevel optimization has become an invaluable tool in areas like hyper-parameter optimization, meta-learning, and reinforcement learning. However, most modern bilevel algorithms are developed for problems with strongly convex and/or unconstrained lower-level tasks. On the contrary, in our work, we deal with a problem where the lower-level task is non-convex and constrained. To solve such a challenging problem, we propose a novel value-function-based approach in which the value function and the respective problem reformulation are discretized. Under this approach, the value function’s optimization problem is transformed into an easier convex optimization task. We tackle the discretized problem using a penalty approach and establish the relation between the KKT points, and the local and global minima of the two problems. For the solution of the penalty reformulated problem, we develop a gradient descent-based method and calculate its convergence rate. Our experimental results confirm the effectiveness of the proposed method.
3 — 17:20 — ** CANCELLED ** D-stationary points for optimization with least constraint violation and the convergence of an SQP method
We consider the nonlinear optimization problem with least