Contribution to the resolution of decentralized Markov decision processes ; Contribution à la résolution des processus de décision markoviens décentralisés
The subject of this thesis is the optimal resolution of decentralized Markov decision processes (DEC-POMDPs). The DEC-POMDP model has been introduced in 2000 and constitutes a formal framework for describing cooperative distributed decision problems under uncertainty. We present a first generalized overview for solving DEC-POMDPs optimally, including game theory, multi-agent planning and reinforcement learning. Our contributions constitute a theoretical approach for building optimal multi-agent systems. Solving DEC-POMDPs can be separated into two categories. If the underlying model of the system is known in advance, the optimal solution can be planned prior execution in a centralized way. We introduce two new planning algorithms. The first one is a point-based multi-agent dynamic programming approach, which constitutes a synthesis of classical multi-agent dynamic programming, and point-based dynamic programming for single-agent POMDPs. Our approach is hence able to concentrate the computational effort on the relevant regions of the policy space. The second approach is an entirely new way of applying heuristic search techniques, such as A*, to decentralized decision problems. We introduce multi-agent A* (MAA*), the first heuristic search algorithm for solving DEC-POMDPs, both over finite and infinite horizons. If the underlying model is not known, an optimal policy can be obtained by a trial-and-error approach based on reinforcement learning methods. We analyse the additional constraints in multi-agent learning vs. planning, before introducing a new multi-agent reinforcement learning algorithm based on mutual notifications of changes in the value function. ; Nous abordons dans cette thèse la résolution optimale des processus de décision markoviens décentralisés (DEC-POMDPs). Le modèle DEC-POMDP constitue un formalisme théorique pour la description de problèmes de prise de décision distribuée et coopérative, et cette thèse est l'une des premières à proposer des algorithmes exactes de recherche de politiques ...