Journées de l'optimisation 2024
HEC Montréal, Québec, Canada, 6 — 8 mai 2024
MB10 - Reinforcement learning
6 mai 2024 15h30 – 17h10
Salle: PWC (vert)
Présidée par Ashkan Amirnia
3 présentations
-
15h30 - 15h55
PPSTOW: An End-to-End Deep Reinforcement Learning Model for Master Stowage Planning on Container Vessels
Container shipping is the backbone of the worldwide economy and holds promise in advancing environmental sustainability objectives. To realize this, liner shipping companies aim to enhance operational efficiency through stowage planning. Due to many combinatorial aspects, some of which are NP-hard, stowage planning is a challenging problem in its representative form. Consequently, the problem is often decomposed into master and slot planning, both of which are challenging problems. Therefore, it is necessary to explore scalable algorithms to solve stowage planning.
Our approach leverages end-to-end deep reinforcement learning to construct solutions to master planning with focus on global problem objectives and constraints, i.e., Proximal Policy optimization for master STOWage planning (PPSTOW). Our experimental results demonstrate the ability of PPSTOW to efficiently find near-optimal solutions for simulated instances with realistic vessel sizes and practical planning horizons. Future work will seek to enhance representativeness by integrating revenue management strategies and addressing local problem objectives and constraints.
-
15h55 - 16h20
Relative Monte Carlo for Reinforcement Learning
We introduce relative Monte Carlo (rMC), a new general purpose policy gradient algorithm for reinforcement learning with discrete action space. The policy is improved in real time using relative returns between a root sample path and counterfactual simulated paths instantiated by taking a different action from the root. The method is compatible with any differentiable policy, including the leading choice of neural network parametrization. It is guaranteed to converge for episodic as well as average reward tasks. Unlike traditional Monte Carol, rMC policy gradient steps are performed throughout the rollout using a memory-efficient update decomposition, inspired by eligibility traces. Strong couplings between root and counterfactual paths further contribute to low data generation and memory requirements. We test rMC with a policy network in a large-scale fulfillment problem. Numerical results show it performs well compared to related algorithms.
-
16h20 - 16h45
Real-time sustainable cobotic disassembly planning based on a reinforcement learning model
The application of collaborative robots (cobots) in disassembly processes is growing. Cobots efficiently perform sensitive and challenging tasks, such as accurate cutting, as well as working with hazardous substances that negatively impact human safety. Moreover, cobots precisely perform simple and repetitive tasks that humans might not complete accurately due to fatigue or distractions. On the other hand, human operators can effectively complete complex and demanding tasks that require a high degree of flexibility and skill. Conversely, cobots lack the necessary attributes to handle such difficult operations. Hence, human-robot collaboration (HRC) simultaneously benefits from the cobots precision and human skills, promising considerable productivity and efficiency in manufacturing. This research proposes a reinforcement learning model for sustainable HRC disassembly planning. Instead of generating a fixed task sequence, it makes real-time decisions based on online conditions, coping with uncertainties that change processes from expected flows. In addition to considering economic objectives, such as minimizing operation time, the model integrates social and environmental parameters, including circularity, safety, and energy consumption, in the planning process. Therefore, the proposed method offers a sustainable planning framework, enabling manufacturers to adjust process workflows based on their specific requirements according to the importance of cost, social, and environmental variables.