Master’s Project: Safe Multi-Robot Planning via Long-Run Averages
Background:
Safe planning in multi-robot systems is often modeled through Markov Decision Processes. In safe MDPs, agents optimize for some numerical reward function while constraining some numerical safety function. Current multi-agent planning focuses on finite horizon MDPs or discounted cost MDPs. When considering safety constraints in multi-robot planning, neither of these options are suitable. Existing work extends the concept of long-run average reward [1] to safe and constrained single-agent planning [2,3]. Extending these concepts to multi-agent systems is an important step toward multi-robot planning and decision making.
Goals:
Your task will be to understand and explore long-run average constrained planning in multi-robot systems. You will examine how single-agent algorithms scale to multi-agent systems, and you will explore options for distributed multi-agent planning algorithms. Finally, you will determine how to describe and model real-world, multi-robot systems with constrained multi-agent MDPs.
Prerequisites:
Requirements include an interest in planning and multi-agent systems, a strong mathematical background, and experience with Python. Successful completion of DD2415 and/or EL2805 preferred.
Contact Information:
Anna Gautier (annagau@kth.se)
References:
[1] Ashok, Pranav, et al. "Value iteration for long-run average reward in Markov decision processes." International Conference on Computer Aided Verification. Springer International Publishing, 2017.
[2] Quatmann, Tim, and Joost-Pieter Katoen. "Multi-objective optimization of long-run average and total rewards." Tools and Algorithms for the Construction and Analysis of Systems. Springer International , 2021.
[3] Agarwal, Mridul, Qinbo Bai, and Vaneet Aggarwal. "Regret guarantees for model-based reinforcement learning with long-term average constraints." Uncertainty in Artificial Intelligence. PMPublishingLR, 2022.