18th INFORMS Computing Society (ICS) Conference
Toronto, Canada, 14 — 16 mars 2025
18th INFORMS Computing Society (ICS) Conference
Toronto, Canada, 14 — 16 mars 2025

Optimization for Machine Learning
16 mars 2025 16h45 – 18h15
Salle: East Common
Présidée par Calvin Tsay
3 présentations
-
16h45 - 17h07
Fast Methods for Large-scale Kernel Ridge Regression
Kernel ridge regression (KRR) is a fundamental computational tool, appearing in problems ranging from computational chemistry to health analytics, with a particular interest due to its starring role in Gaussian process regression. However, it is challenging to scale KRR solvers to large datasets: the memory and per-iteration computational complexities for direct methods (i.e., Cholesky decomposition) and indirect methods (i.e., PCG), scale quadratically and/or cubically with the number of training samples.
To alleviate these scalability issues, we propose an approximate sketch-and-project method, ASkotch, to solve full KRR, and a variance-reduced stochastic gradient method, Mimosa, to solve inducing points KRR. Our algorithms (i) have lower memory and per-iteration computational complexities than existing KRR solvers, (ii) are easy to tune, (iii) obtain linear convergence to the optimum of the KRR training problem, and (iv) outperform state-of-the-art KRR solvers on a variety of large-scale regression and classification tasks.
-
17h07 - 17h29
Preconditioned Variance-Reduced Stochastic Gradient Algorithms for Faster Empirical Risk Minimization.
Ill-conditioned problems are ubiquitous in large-scale machine learning: as a data set grows to include more and more features correlated with the labels, the condition number increases. Yet traditional stochastic gradient methods converge slowly on these ill-conditioned problems, even with careful hyperparameter tuning.
In this talk, we introduce PROMISE, a suite of sketching-based preconditioned stochastic gradient algorithms that deliver fast convergence on ill-conditioned large-scale convex optimization problems arising in machine learning. PROMISE includes preconditioned versions of SVRG, SAGA, and Katyusha; each algorithm comes with a strong theoretical analysis and effective default hyperparameter values. Empirically, we verify the superiority of the proposed algorithms by showing that, using default hyperparameter values, they outperform or match popular tuned stochastic gradient optimizers.
Overall, PROMISE methods offer a compelling alternative to popular stochastic first-order optimization algorithms for large-scale machine learning.
-
17h29 - 17h51
An MIQP Formulation for Gaussian Processes Using a Piecewise-Linear Kernel Approximation
Optimization over a Gaussian process model is a key step in many Bayesian optimization (BO) algorithms. However, this step turns out to be a challenging, nonlinear and non-convex optimization problem itself. Despite the relative importance of this step, most BO algorithms employ sampling- or gradient-based methods, which do not provably converge to global optima. This presentation investigates mixed-integer programming (MIP) as a paradigm for global optimization over Gaussian processes. Specifically, we present PK-MIQP (Piecewise-linear Kernel Mixed Integer Quadratic Programming), a formulation based on a piecewise-linear approximation for Gaussian process kernels and admits a corresponding MIQP representation. The proposed method effectively pre-processes and approximates non-linear kernel functions towards tractable global optimization. The final MIQP model is optionally warm-started by solving a simpler sub-model. We analyze the theoretical regret bounds of the proposed approximation, and empirically demonstrate the framework in a full BO procedure on synthetic functions, constrained benchmarks, and hyperparameter tuning.