15h30 - 15h55
Analysis of incremental algorithms accumulation points in the floating point framework
In Machine Learning, Artificial Neural Networks (ANNs) are a very powerful tool, broadly used in many applications. Often, the selected (deep) architectures include many layers, and therefore a large amount of parameters, which makes training, storage and inference expensive. This motivated a stream of research about compressing the original networks without excessively sacrificing performances.
The commonly used data representation for the parameters in ANNs and for all the operations involved in training and inference is 32 bit floating points. This high precision format can be expensive and it has been shown that it is not really needed to achieve good performances. Consequently, training ANNs using a lower bit data representation has become an important field of research in Machine Learning.
Obviously, using low bit floating points leads to significant errors in the computation of every operation performed during training, so the classical convergence results do not hold and a different theoretical analysis is needed. We can apply to the floating point framework some of the existing results in the optimization literature regarding perturbated gradient-type algorithms. Moreover, even if we already have some good existing results, many of the hypotheses assumed in the literature do not strictly hold in the floating point scenario, so we can find some interesting lines of research to fully understand the convergence of training algorithms when using lower bit data representation .
15h55 - 16h20
Optimization algorithms with low-precision floating and fixed point arithmetic
Digital hardware are designed to work with only a specific finite set of real values which are known as the floating point numbers. Floating point numbers are encoded using string of digits and arithmetic between them are carried according to some hardware-specific rules and standards. Because of this, two different hardware will not necessarily yield the exact same result when solving mathematical problems (e.g., optimization ). This difference between the two outputs is negligible enough under the (traditionally fine) assumption that both hardware have high enough precision (the smallest distance between two floating point numbers). However, with the rise of neural networks, where training deep networks in terms of speed, and memory and power consumption is known to be a tremendous challenge, lower precision in exchange for faster and more economical mathematical operations are preferred. Hence, floating-point systems with lower precisions come to play when training/using neural networks and therefore the (used to be ignored ) computational errors as well as the cost of basic mathematical operations such as multiplication are now a major part of optimization algorithms' analysis. In this presentation, we first discuss a new formulation for first-order algorithms e.g., GD and SGD that takes into account hardware's rounding errors. Next, we discuss a novel integer SGD with stochastic rounding algorithm.
16h20 - 16h45
Solving Multi-Echelon Inventory Problems with Heuristic-Guided Deep Reinforcement Learning and Centralized Control
Multi-echelon inventory models aim to minimize the system-wide total cost in a multi-stage supply chain by applying a proper ordering policy to each of the stages. In a practical inventory system when backlog costs can be incurred in multiple stages, this problem cannot be solved analytically and it is intractable to solve by traditional optimization methods. To alleviate the curse of dimensionality in this problem, we apply and compare three efficient deep reinforcement learning (DRL) algorithms namely Deep Q-network, Advantage Actor-Critic and Twin Delayed Deep Deterministic Policy Gradient, to efficiently determine the inventory policy. We consider a serial supply chain as in the beer game, a classic multi-echelon inventory problem, and extend the application of DRL to the centralized decision-making setting which is more complex due to significantly larger state and action space. We also propose a heuristic-guided exploration mechanism to improve the training efficiency by incorporating known heuristics into the exploration process of the DRL algorithms. The experiments show that in both decentralized and centralized settings, the DRL agents learned policies with significant cost savings compared to benchmark heuristics.
16h45 - 17h10
How to compress deep neural networks?
While deep learning is considered one of the breakthroughs of the century, their massive computation requirement limits their use in real products. This problem appears for two reasons: resource-hungry training, and energy-hungry inference.
While computers are developed to tackle general purpose computation, most of deep learning researchers still explore i) shall we accelerate neural networks using existing hardware or ii) re-design new specific hardware to neural networks. We explore fundamental computations of deep learning and review important recent contributions to the field.