01:30 PM - 01:55 PM
Sufficient optimality conditions hold for almost all nonlinear semidefinite programs
We derive a new genericity result for nonlinear semidefnite programming (NLSDP). Namely, almost all linear perturbations of a given NLSDP are shown to be nondegenerate. Here, nondegeneracy for NLSDP refers to the transversality constraint qualification, strict complementarity and second-order sufficient condition. Due to the presence of the second-order suffcient condition, our result is a nontrivial extension of the corresponding results for linear semidefinite programs (SDP) from Alizadeh et al. (Math Program 77(2, Ser. B):111-128, 1997). The proof of the genericity result makes use of Forsgrens derivation of optimality conditions for NLSDP in Forsgren (Math Program 88(1, Ser. A):105-128, 2000). Due to the latter approach, the positive semidefiniteness of a symmetric matrix is locally equivalent to the fact that its certain Schur complement is positive semidefinite. This yields a reduced NLSDP by considering the new semidefinite constraint. While deriving optimality conditions for the reduced NLSDP, the well-known and often mentioned H-term in the second-order sufficient condition vanishes. This allows us to access the proof of the genericity result for NLSDP.
01:55 PM - 02:20 PM
A variational perspective for accelerated methods in optimization
Accelerated gradient methods play a central role in optimization, achieving optimal rates in many settings. While many generalizations and extensions of Nesterov's original acceleration method have been proposed, it is not yet clear what is the natural scope of the acceleration concept. In this work, we study accelerated methods from a continuous-time perspective. We show that there is a Lagrangian functional that we call the “Bregman Lagrangian” which generates a large class of accelerated methods in continuous time, including (but not limited to) accelerated gradient descent, its non-Euclidean extension, and accelerated higher-order gradient methods. We show that the continuous-time limit of all of these methods correspond to traveling the same curve in spacetime at different speeds. From this perspective, Nesterov's technique and many of its generalizations can be viewed as a systematic way to go from the continuous-time curves generated by the Bregman Lagrangian to a family of discrete-time accelerated algorithms.
02:20 PM - 02:45 PM
Smooth quasi-Newton methods for nonsmooth optimization
Sporadic informal observations over several decades (and most recently in Lewis-Overton, 2013) suggest that quasi-Newton methods for smooth optimization can also work surprisingly well on nonsmooth functions. This talk explores this phenomenon from several perspectives. First, we compare experimentally the two most popular quasi-Newton updates, BFGS and SR1, in the nonsmooth setting. Secondly, we study how repeated BFGS updating at a single fixed point can serve as a separation oracle (for the subdifferential). Lastly, we show how Powell's original 1976 BFGS convergence proof for smooth convex functions in fact extends to some nonsmooth settings.
02:45 PM - 03:10 PM
A SMART stochastic algorithm for nonconvex optimization with applications to robust Machine Learning
We show how to transform any optimization problem that arises from fitting a machine learning model into one that (1) detects and removes contaminated data from the training set and (2) simultaneously fits the trimmed model on the remaining uncontaminated data. To solve the resulting nonconvex optimization problem, we introduce a fast stochastic proximal-gradient algorithm that incorporates prior knowledge through nonsmooth regularization.
This is joint work with Damek Davis.