18th INFORMS Computing Society (ICS) Conference
Toronto, Canada, 14 — 16 mars 2025
18th INFORMS Computing Society (ICS) Conference
Toronto, Canada, 14 — 16 mars 2025

Generative AI II
16 mars 2025 14h45 – 16h15
Salle: Debates
Présidée par Haotian Zhai
3 présentations
-
14h45 - 15h07
A Natural Language Interface to Formulate Resource Allocation Problems in Networks
Advancements in LLM allow us to address complex analytical tasks, e.g., formulation of resource allocation problems, which go beyond simply translating natural language into LP, ILP, or MILP models. We define a task for formulating optimization problems in network resource allocation and introduce a framework to evaluate contemporary LLM performance. We present NL4RA, a dataset of network resource allocation problems and formulations from 50 peer-reviewed articles. We evaluate the performance of Llama-3 and Phi-3, with varying parameter counts. To enhance baseline performance, we propose LM4RA framework, using multiple candidate response ranking. During the evaluation, we observed discrepancies between human judgments and metrics like ROUGE, BLEU, and BERT-scores, but human evaluation is time-consuming. To quantify the difference between LLM-generated responses and ground truth, we introduce LAME, an automated evaluation metric. While baseline LLMs show promise, they fall short of human expertise; our method surpasses these baselines in LAME and other metrics.
-
15h07 - 15h29
LLMs for Cold-Start Cutting Plane Separator Configuration
Mixed integer linear programming (MILP) solvers ship with a staggering number of parameters that are challenging to select a priori for all but expert optimization users, but can have an outsized impact on the performance of the MILP solver. Existing machine learning (ML) approaches to configure solvers require training ML models by solving thousands of related MILP instances, generalize poorly to new problem sizes, and often require implementing complex ML pipelines and custom solver interfaces that can be difficult to integrate into existing optimization workflows. In this paper, we introduce a new LLM-based framework to configure which cutting plane separators to use for a given MILP problem with little to no training data based on characteristics of the instance, such as a natural language description of the problem and the associated LaTeX formulation. We augment these LLMs with descriptions of cutting plane separators available in a given solver, grounded by summarizing the existing research literature on separators. While individual solver configurations have a large variance in performance, we present a novel ensembling strategy that clusters and aggregates configurations to create a small portfolio of high-performing configurations. Our LLM-based methodology requires no custom solver interface, can find a high-performing configuration by solving only a small number of MILPs, and can generate the configuration with simple API calls that run in under a second. Numerical results show our approach is competitive with existing configuration approaches on a suite of classic combinatorial optimization problems and real-world datasets with only a fraction of the training data and computation time.
-
15h29 - 15h51
LLMs for Evaluating MIP Formulation Equivalence
Optimization copilots translate natural language into formal optimization models, enabling non-experts to solve complex problems. Existing benchmarks primarily focus on execution accuracy, which is sensitive to minor model variations, such as variable scaling or added constraints. We propose a robust benchmark that pairs natural language descriptions with reference optimization models and introduces a novel evaluation methodology using Large Language Models (LLMs). Our approach maps decision variables between generated and reference models to assess their equivalence, enabling a more nuanced evaluation. Unlike traditional methods, our framework focuses on the fundamental equivalence of solutions within a unified variable space, providing a more reliable and insightful evaluation. Experimental results demonstrate the effectiveness of our benchmark, offering a flexible and robust standard for assessing optimization copilots.