18th International Symposium on Dynamic Games and Applications

Grenoble, France, 9 — 12 juillet 2018

18th International Symposium on Dynamic Games and Applications

Grenoble, France, 9 — 12 juillet 2018

Horaire Auteurs Mon horaire

Optimal Control and Differential Games Methods 2

12 juil. 2018 16h10 – 17h50

Salle: salle H.101

Présidée par Stéphane Le Ménec

4 présentations

  • 16h10 - 16h35

    Existence of a Value and a Saddle Point in Differential Games for Fractional Order Systems

    • Mikhail Gomoyunov, prés.,

    We study a two-person zero-sum differential game on a finite time interval in which a conflict-controlled dynamical system is described by a nonlinear fractional order differential equation with the Caputo derivative. The control actions of the players are subject to geometric constraints. Quality of the control process is determined by a given index. Within the positional approach, the game is formalized in the classes of control-with-guide strategies. To prove the existence of a value and a saddle point of the game, we introduce an approximating differential game in the classes of positional strategies in which a conflict-controlled system is described by a functional differential equation of a retarded type with the usual first order derivative. We show that the value of the approximating differential game tends in the limit to the value of the original game, and that the optimal strategies in the original differential game can be constructed by using the optimal motions in the approximating differential game as guides. The method that we propose here for approximating fractional order differential systems by functional differential systems is based on the finite-difference Grunwald-Letnikov formulas for calculation of fractional derivatives. The proof of the convergence relies on the original estimates of fractional derivatives of quadratic Lyapunov functions.

  • 16h35 - 17h00

    Existence of Value for a Differential Game with Asymmetric Information and Signal Revealing

    • Xiaochi Wu, prés., Université de Bretagne Occidentale

    We investigate an infinite horizon two person zero sum differential game with asymmetric information on the payoffs and with signal revealing during the game. The game is played as follows. Before it begins, a cost function is randomly chosen among several ones according to a fixed probability measure and each player receives a private signal generated by the chosen cost function. During the game, each player chooses his control in order to optimize the cost function while observing all played actions with perfect memory. In addition, we suppose that as soon as the dynamic hits a fixed target set, the choice of the cost function will be publicly revealed to both players.
    We adapt the notion of random non-anticipative strategy with delay to this game model and we prove that under Isaacs' condition, the game has a value. Furthermore, we demonstrate that its value function is the unique bounded continuous viscosity solution of a second-order Hamilton-Jacobi-Isaacs equation.

  • 17h00 - 17h25

    Noncooperative Model Predictive Control

    • Marleen Stieler, University of Bayreuth
    • Michael Heinrich Baumann, prés., University of Bayreuth

    Nash strategies are a natural solution concept for noncooperative dynamic games because of their `stable' nature. For optimal control problems on very long or even infinite horizons, Model Predictive Control (MPC) appears to be a well suited numerical method to approximate optimal solutions. The idea of MPC is to repeatedly solve an optimization problem on a (shorter) finite horizon with current state as initial value and to apply only the first piece of the optimal strategy to the system.

    The idea to perform and analyze MPC based on Nash strategies instead of (Pareto-) optimal control sequences is appealing because it allows for solving dynamic games that are analytically intractable (on an infinite horizon). However, existence and structure of Nash strategies heavily depend on the specific game under consideration. This is in contrast to solution concepts such as usual optimality and Pareto optimality, in which one can state very general existence results. Moreover, the calculation of Nash strategies is, in general, a difficult task.

    In this talk we present a class of games, namely affine-quadratic games, for which sufficient conditions for trajectory convergence of the MPC solution can be derived. We furthermore investigate the relation between the closed- and open-loop Nash strategies on the infinite horizon in terms of the trajectories.

  • 17h25 - 17h50

    Distributed Control and Game Theory for Swarming Robot Control

    • Stéphane Le Ménec, prés., Airbus Group / MBDA

    This research project uses methods of multiple autonomous vehicle control with the lack of a centralize control structure. Dependencies on a master controller in master / slave system can cause major problems if that master device fails. Consequently, the slave devices would be rendered useless. If we move away from a master / slave method we can create robots that for the most part only rely on themselves and not on a centralize control device. This should allow for a robot to fail and the others to complete the overall task.
    In addition, when the robots are simple identical platforms, i.e. naïve robots implementing identical controllers, we may consider ten of them or easily decide to spawn one hundred more. Deficiency of one robot is no more an issue as plenty of other robots are able to perform similar tasks. To break down the curse of dimensionality when coordinating a team of cooperative robots it is of first importance to apply another feature of swarming that imposes the swarm entities to use local information only, i.e. to only exchange limited amounts of data between robots located in the same neighbourhood. In a classical manner, data links are used to broadcast robot positions (plus sometimes, if necessary, some short messages, i.e. tags associated to numerical values). Due to latency and bandwidth limitations in the communication network, in regular swarm architectures, a robot just informs its neighbours about its position without relaying information it receives, information that could cause huge congestions in the network.
    The agents in the swarm have then to decide about actions based on the data they received through broadcast messages plus their own limited perception of the environment. Searching objects in an unknown environment can be performed in an efficient manner using such mechanisms that run in parallel and in a robust manner.
    The way the robots inside a swarm behave, whatever they are ground mobile devices or drones, may consist in a trade-off between two basic behaviours. First behaviour is formation keeping to maintain communication links with neighbours to not lose communication capability and to not miss the objects the swarm is looking for. Formation keeping is also a behaviour that has to guarantee the agents in the swarm to not collide each other. Such behaviour can be performed using various techniques. One of them consists to use potential functions. However, each agent in the swarm has also to take care of the second behaviour / objective that consists to decide about exploration directions. Because the efficiency of each robot behaviour depends on the choices of the others (every robot contributes to formation keeping), because the criteria we aim to improve is the swarm / team efficiency that relies on individual actions (force the swarm to explore) and because there are no prescriptive mechanisms, we apply cooperative game techniques to push selfish agents, i.e. agents optimizing their own utility functions, to act in a cooperative manner.
    Evolutionary game techniques on myopic time horizons running simplified kinematic models are used to decide the agent / player strategies (probability distribution) between the two behaviours described above. An agent that decides to defect, i.e. to not take care about formation keeping, i.e. to commute to an exploration behaviour receives large incomes due to the new investigated areas as long as the others take care about collision avoidance and network consistency maintenance. Stability of the formation keeping guidance scheme is an important aspect to take into account. Swarming simulation results involving several robots will be presented.

Retour