2018 Optimization Days
HEC Montréal, Québec, Canada, 7 — 9 May 2018
TB9 On the integration of machine learning and mathematical optimization I
May 8, 2018 03:30 PM – 05:10 PM
Location: Quebecor (80)
Chaired by Emma Frejinger
4 Presentations
-
03:30 PM - 03:55 PM
A comparison of inverse optimization and machine learning for predicting behaviour of optimizers
We consider the problem of imputing a customer's utility function. Both machine learning (ML) and inverse optimization (IO) methods can be used to impute utility functions. We experimentally compare the performance of these methods and identify their respective strengths and weaknesses when data is generated by an optimization process.
-
03:55 PM - 04:20 PM
A trust-region method for minimizing regularized non-convex loss functions
The training of deep neural networks is typically conducted via nonconvex optimization. Indeed, for nonlinear models, the nonlinear nature of the activation functions yields empirical loss functions that are nonconvex in the weight parameters. Even for linear models, i.e., when all activation functions are linear with respect to inputs and the output of the entire deep neural network is a chained product of weight matrices with the input vector, the (squared error) loss functions remain nonconvex. On the other hand, to circumvent the limits resulting from finding sharp minima (corresponding to weight parameters specified with high precision) of the empirical loss function, Hochreiter suggested in 1995 to find a large region in the weight parameter space with the property that each weight from that region can be given with low precision and lead to similar small error. In this paper, we propose to minimize the empirical loss (training error) together with weights precision (regularization error) by means of a Trust Region (TR)-based algorithm. When extended to nonconvex regularized objectives, this method contrasts to current techniques which either arbitrarily -sometimes strongly- convexify the empirical loss minimization problem or involve slowly converging Stochastic Gradient algorithms without guaranteeing the production of good predictors. TR methods instead provide i) better convergence guarantees compared to other second order methods by means of rich set of methods for step computation, e.g., dogleg, Steighaug; ii) advantageous computational complexity compared to Stochastic Gradient (SG) for nonconvex loss functions; and iii) fast escape from saddle points, e.g., by model reparametrization. In addition, they are combinable with techniques, e.g., tunneling, smoothing, etc., to avoid getting trapped into local minima and with randomized approximation (sub-sampling) that is effective in reducing computational cost associated to Hessian evaluation. The latter provides an essential property in solving high-dimensional instances. Performance bounds of the TR-based algorithm are characterized against gradient descent together with numerical experiments for evaluation and comparison purposes.
-
04:20 PM - 04:45 PM
A machine learning approximation algorithm for fast prediction of solutions to discrete optimization problems
We propose to predict descriptions of solutions to discrete stochastic optimization problems in very short computing time using machine learning. The labeled training dataset consists of a large number of deterministic problems that have been solved independently. Uncertainty regarding the inputs is addressed through sampling and aggregation methods.
-
04:45 PM - 05:10 PM
Deciding whether to linearize MIQPs: A learning approach
Within state-of-the-art solvers such as IBM-CPLEX the ability to solve both convex and nonconvex Mixed-Integer Quadratic Programming (MIQP) problems to proven optimality goes back few years, yet presents unclear aspects. We are interested in understanding whether for solving an MIQP it is favorable to linearize its quadratic part or not. Our approach exploits Machine Learning techniques to learn a classifier that predicts, for a given instance, the most suitable resolution method within CPLEX's framework. We aim as well at gaining methodological insights about the instances' features leading this discrimination. Together with a new generated dataset, we examine part of CPLEX internal testbed and discuss different scenarios to integrate learning and optimization processes. By defining novel measures, we interpret learning results and evaluate the quality of the tested classifiers from the optimization point of view.