15th EUROPT Workshop on Advances in Continuous Optimization

Montréal, Canada, 12 — 14 juillet 2017

15th EUROPT Workshop on Advances in Continuous Optimization

Montréal, Canada, 12 — 14 juillet 2017

Horaire Auteurs Mon horaire
Cal add eabad1550a3cf3ed9646c36511a21a854fcb401e3247c61aefa77286b00fe402

Optimization in Machine Learning

12 juil. 2017 13h30 – 15h10

Salle: Saine Marketing

Présidée par Simon Lacoste-Julien

4 présentations

  • Cal add eabad1550a3cf3ed9646c36511a21a854fcb401e3247c61aefa77286b00fe402
    13h30 - 13h55

    Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-\L{}ojasiewicz Condition

    • Mark Schmidt, prés., UBC

    In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the \L{}ojasiewicz inequality proposed in the same year, and it does not require strong convexity (or even convexity). In this work, we show that this much-older Polyak-\L{}ojasiewicz (PL) inequality is actually weaker than the main conditions that have been explored to show linear convergence rates without strong convexity over the last 25 years. We also use the PL inequality to give new analyses of randomized and greedy coordinate descent methods, sign-based gradient descent methods, and stochastic gradient methods in the classic setting (with decreasing or constant step-sizes) as well as the variance-reduced setting. We further propose a generalization that applies to proximal-gradient methods for non-smooth optimization, leading to simple proofs of linear convergence of these methods. Along the way, we give simple convergence results for a wide variety of problems in machine learning: least squares, logistic regression, boosting, resilient backpropagation, L1-regularization, support vector machines, stochastic dual coordinate ascent, and stochastic variance-reduced gradient methods.

  • Cal add eabad1550a3cf3ed9646c36511a21a854fcb401e3247c61aefa77286b00fe402
    13h55 - 14h20

    Stochastic Methods for Composite Optimization Problems

    • John Duchi, prés., Stanford University
    • Feng Ruan, Stanford University

    We consider minimization of stochastic functionals that are compositions of a (potentially) non-smooth convex function h and smooth function c. We develop two stochastic methods--a stochastic prox-linear algorithm and a stochastic (generalized) sub-gradient procedure--and prove that, under mild technical conditions, each converges to first-order stationary points of the stochastic objective. We provide experiments further investigating our methods on non-smooth phase retrieval problems; the experiments indicate the practical effectiveness of the procedures.

  • Cal add eabad1550a3cf3ed9646c36511a21a854fcb401e3247c61aefa77286b00fe402
    14h20 - 14h45

    A New Framework of Analysis for Asynchronous Parallel Stochastic Gradient Methods

    • Rémi Leblond, prés., INRIA

    Although asynchronous parallel stochastic gradient methods have received more and more attention lately, they are notoriously hard to analyse correctly. We introduce a new framework to enable easy and correct proofs. We illustrate its applicability on a novel algorithm, PROXASAGA, a sparse asynchronous parallel variant of the SAGA algorithm. SAGA is an incremental gradient method with variance reduction that gives fast linear convergence rates for composite (non-smooth) objectives. We prove that PROXASAGA can obtain a theoretical linear speedup on multi-core systems under suitable assumptions (even without sparsity in the special case of a smooth objective).

  • Cal add eabad1550a3cf3ed9646c36511a21a854fcb401e3247c61aefa77286b00fe402
    14h45 - 15h10

    Frank-Wolfe Algorithms for Saddle Point Problems

    • Gauthier Gidel, prés., UdeM DIRO/MILA
    • Tony Jebara, Columbia U. & Netflix Inc.
    • Simon Lacoste-Julien, Université de Montréal

    We extend the Frank-Wolfe (FW) optimization algorithm to solve constrained smooth convex-concave saddle point problems. Remarkably, the method only requires access to linear minimization oracles. Leveraging recent advances in FW optimization, we provide the first proof of convergence of a FW-type saddle point solver over polytopes, thereby partially answering a 30 year-old conjecture. We verify our convergence rates empirically and observe that by using a heuristic step-size, we can get empirical convergence under more general conditions, paving the way for future theoretical work.