Journées de l'optimisation 2024

HEC Montréal, Québec, Canada, 6 — 8 mai 2024

Horaire Auteurs Mon horaire

WA2 - VRP and ML

8 mai 2024 10h30 – 12h10

Salle: Procter & Gamble (vert)

Présidée par Gaël Reynal

4 présentations

  • 10h30 - 10h55

    A Deep Reinforcement Learning Algorithm for the Vehicle Routing Problem with Stochastic Demands and Outsourcing

    • Mohsen Dastpak, prés., Air Canada
    • Fausto Errico, École de technologie supérieure
    • Ola Jabali, Politecnico di Milano

    This study addresses a stochastic variant of the vehicle routing problem with uncertain customer locations and demands. The customer set varies daily, with actual demand revealed upon visit. Vehicles operate within a time limit, incurring overtime costs if exceeded. To optimize costs, outsourcing a subset of customers to a common carrier is an option. This problem entails two key decisions: outsourcing a subset of customers at the beginning of the day and dynamically routing the retained ones to minimize travel, overtime, and outsourcing costs. We introduce a two-level decision structure. The first level determines the committed and outsourced customers. Given the set of committed customers, the second-level problem routes the private fleet and computes the expected routing costs. We propose a hybrid method that adopts an Iterated Local Search algorithm for the first-level problem and a Deep Q-network algorithm, based on a Graph Attention Network, for the second-level problem. Extensive experiments illustrate the effectiveness of our method over benchmarks in terms of overall costs. Notably, our method demonstrates efficient computation, generating the outsourcing solution in less than a minute, while in some benchmarks, the algorithm did not terminate within a two-hour time limit.

  • 10h55 - 11h20

    A Reinforcement Learning Approach for Dynamic Rebalancing in Bike-Sharing Systems

    • Jiaqi Liang, prés., Polytechnique Montréal
    • Sanjay Dominik Jena, Université du Québec à Montréal
    • Defeng Liu, Polytechnique Montreal
    • Andrea Lodi, CERC, Polytechnique Montréal, Montréal, Canada and Jacobs Technion-Cornell Institute, Cornell Tech and Technion - IIT, New York, USA

    Bike-Sharing Systems provide eco-friendly urban mobility, contributing to the alleviation of traffic congestion and to healthier lifestyles. Efficiently operating such systems and maintaining high customer satisfaction is challenging due to the stochastic nature of trip demand, leading to full or empty stations. Devising effective rebalancing strategies using vehicles to redistribute bikes among stations is therefore of uttermost importance for operators. As a promising alternative to classical mathematical optimization, reinforcement learning is gaining ground to solve sequential decision-making problems. This paper introduces a spatio-temporal reinforcement learning algorithm for the dynamic rebalancing problem with multiple vehicles. We first formulate the problem as a Multi-agent Markov Decision Process in a continuous time framework. This allows for independent and cooperative vehicle rebalancing, eliminating the impractical restriction of time-discretized models where vehicle departures are synchronized. A comprehensive simulator under the first-arrive-first-serve rule is then developed to facilitate the learning process by computing immediate rewards under diverse demand scenarios. To estimate the value function and learn the rebalancing policy, various Deep Q-Network configurations are tested, minimizing the lost demand. Experiments are carried out on various datasets generated from historical data, affected by both temporal and weather factors. The proposed algorithms outperform benchmarks, including a multi-period Mixed-Integer Programming model, in terms of lost demand. Once trained, it yields immediate decisions, making it suitable for real-time applications. Our work offers practical insights for operators and enriches the integration of reinforcement learning into dynamic rebalancing problems, paving the way for more intelligent and robust urban mobility solutions.

  • 11h20 - 11h45

    Learning to solve large-scale districting problems

    • Cheikh Ahmed, prés., Polytechnique Montréal
    • Alexandre Forel, Polytechnique Montreal
    • Axel Parmentier, CERMICS, École des Ponts
    • Vidal Thibaut, Massachusetts Institute of Technology, USA

    Districting is a complex combinatorial problem that consists in partitioning a geographical area into small districts. In logistics, it is a major strategic decision determining operating costs for several years. Solving them using traditional methods is intractable even for small geographical areas and existing heuristics, while quick, often provide sub-optimal results. We present a structured learning approach to find high-quality solutions to real-world districting problems in a few minutes. It is based on integrating a combinatorial optimization layer, the capacitated minimum spanning tree problem, into a graph neural network architecture. To train this pipeline in a decision-aware fashion, we show how to construct target solutions embedded in a suitable space and learn from target solutions. Experiments show that our approach outperforms existing methods and can reduce costs by up to 15\% on real-world cities.

  • 11h45 - 12h10

    Learning to solve the pricing problem of the vehicle routing problem with reinforcement learning

    • Gaël Reynal, prés., Polytechnique Montreal
    • Quentin Cappart, Polytechnique Montréal
    • Guy Desaulniers, GERAD - Polytechnique Montréal
    • Louis-Martin Rousseau, Polytechnique Montreal & CIRRELT

    This work proposes a new matheuristic algorithm to solve the Vehicle Routing Problem with Stochastic Demands (VRPSD) through column generation, inspired by the approach of Gauvin et al. (2014). We address the pricing problem using a reinforcement learning algorithm with graph neural networks to learn how to identify negative reduced-cost routes to add to the Restricted Master Problem (RMP). We combine this method with a tabu search to improve diversification along with a greedy search for acceleration purposes. After a given search time used for generating columns, we solve the Mixed Integer Program (MIP) corresponding to the final RMP to obtain an integer solution. We compare this method to the dynamic programming algorithm developed by Gauvin et al. (2014), applying the same strategy of solving the resulting MIP after a limited time spent to generate columns. Despite the time-intensive nature of our method, we often manage to find better solutions than the algorithm they developed, particularly when considering challenging instances featuring long feasible routes.

Retour