Journées de l'optimisation 2018

HEC Montréal, Québec, Canada, 7 — 9 mai 2018

Horaire Auteurs Mon horaire
Cal add eabad1550a3cf3ed9646c36511a21a854fcb401e3247c61aefa77286b00fe402

TB9 On the integration of machine learning and mathematical optimization I

8 mai 2018 15h30 – 17h10

Salle: Quebecor (80)

Présidée par Emma Frejinger

4 présentations

  • Cal add eabad1550a3cf3ed9646c36511a21a854fcb401e3247c61aefa77286b00fe402
    15h30 - 15h55

    A comparison of inverse optimization and machine learning for predicting behaviour of optimizers

    • Elaheh Iraj, prés.,
    • Daria Terekhov, Concordia University

    We consider the problem of imputing a customer's utility function. Both machine learning (ML) and inverse optimization (IO) methods can be used to impute utility functions. We experimentally compare the performance of these methods and identify their respective strengths and weaknesses when data is generated by an optimization process.

  • Cal add eabad1550a3cf3ed9646c36511a21a854fcb401e3247c61aefa77286b00fe402
    15h55 - 16h20

    A trust-region method for minimizing regularized non-convex loss functions

    • Dimitri Papadimitriou, prés., Bell Labs

    The training of deep neural networks is typically conducted via nonconvex optimization. Indeed, for nonlinear models, the nonlinear nature of the activation functions yields empirical loss functions that are nonconvex in the weight parameters. Even for linear models, i.e., when all activation functions are linear with respect to inputs and the output of the entire deep neural network is a chained product of weight matrices with the input vector, the (squared error) loss functions remain nonconvex. On the other hand, to circumvent the limits resulting from finding sharp minima (corresponding to weight parameters specified with high precision) of the empirical loss function, Hochreiter suggested in 1995 to find a large region in the weight parameter space with the property that each weight from that region can be given with low precision and lead to similar small error. In this paper, we propose to minimize the empirical loss (training error) together with weights precision (regularization error) by means of a Trust Region (TR)-based algorithm. When extended to nonconvex regularized objectives, this method contrasts to current techniques which either arbitrarily -sometimes strongly- convexify the empirical loss minimization problem or involve slowly converging Stochastic Gradient algorithms without guaranteeing the production of good predictors. TR methods instead provide i) better convergence guarantees compared to other second order methods by means of rich set of methods for step computation, e.g., dogleg, Steighaug; ii) advantageous computational complexity compared to Stochastic Gradient (SG) for nonconvex loss functions; and iii) fast escape from saddle points, e.g., by model reparametrization. In addition, they are combinable with techniques, e.g., tunneling, smoothing, etc., to avoid getting trapped into local minima and with randomized approximation (sub-sampling) that is effective in reducing computational cost associated to Hessian evaluation. The latter provides an essential property in solving high-dimensional instances. Performance bounds of the TR-based algorithm are characterized against gradient descent together with numerical experiments for evaluation and comparison purposes.

  • Cal add eabad1550a3cf3ed9646c36511a21a854fcb401e3247c61aefa77286b00fe402
    16h20 - 16h45

    A machine learning approximation algorithm for fast prediction of solutions to discrete optimization problems

    • Eric Larsen, DIRO, Université de Montréal
    • Sébastien Lachapelle, prés., DIRO, Université de Montréal
    • Yoshua Bengio, MILA, Université de Montréal
    • Emma Frejinger, DIRO and CIRRELT
    • Simon Lacoste-Julien, MILA, Université de Montréal
    • Andrea Lodi, Polytechnique Montréal

    We propose to predict descriptions of solutions to discrete stochastic optimization problems in very short computing time using machine learning. The labeled training dataset consists of a large number of deterministic problems that have been solved independently. Uncertainty regarding the inputs is addressed through sampling and aggregation methods.

  • Cal add eabad1550a3cf3ed9646c36511a21a854fcb401e3247c61aefa77286b00fe402
    16h45 - 17h10

    Deciding whether to linearize MIQPs: A learning approach

    • Zarpellon Giulia, prés., Polytechnique Montreal
    • Lodi Andrea, Polytechnique Montreal
    • Pierre Bonami, CPLEX Optimization, IBM Spain

    Within state-of-the-art solvers such as IBM-CPLEX the ability to solve both convex and nonconvex Mixed-Integer Quadratic Programming (MIQP) problems to proven optimality goes back few years, yet presents unclear aspects. We are interested in understanding whether for solving an MIQP it is favorable to linearize its quadratic part or not. Our approach exploits Machine Learning techniques to learn a classifier that predicts, for a given instance, the most suitable resolution method within CPLEX's framework. We aim as well at gaining methodological insights about the instances' features leading this discrimination. Together with a new generated dataset, we examine part of CPLEX internal testbed and discuss different scenarios to integrate learning and optimization processes. By defining novel measures, we interpret learning results and evaluate the quality of the tested classifiers from the optimization point of view.