Publication: Sequential Decision-Making for Multi-Robot Systems with Real-World Uncertainty using Rollout-based Reinforcement Learning
No Thumbnail Available
Open/View Files
Date
2024-11-19
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Bhattacharya, Sushmita. 2024. Sequential Decision-Making for Multi-Robot Systems with Real-World Uncertainty using Rollout-based Reinforcement Learning. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
In this thesis, we develop coordinated, non-myopic policies for sequential decision-making problems for multi-robot systems while addressing challenges, including partial state observation due to sparse sensing, computation limitations associated with multiple agents, unreliable communication between agents, and dynamically changing system models and future uncertainties. More specifically, we focus on a model-based Reinforcement Learning autonomous control planning method called rollout that employs a one-step stochastic lookahead optimization with policy and value-space approximation to estimate future uncertainties. We focus on such an approach due to its inherent performance improvement property. The novelty of this work lies in its treatment of partial observation and its resolution of the problem into simpler sub-components to propose compute-efficient algorithms for large-scale real-world robotics applications, including, but not limited to, autonomous repair, autonomous routing and pickup, and wildlife monitoring.
Partial observation of the state introduces uncertainties, making the belief space explode and making it difficult to train offline policies. A novel feature of our approach is that it is well suited for distributed computation through an extended belief space formulation and the use of a partitioned architecture, which is improved using an approximate policy iteration with multiple policy networks. We extend multi-agent rollout algorithms for partial observation cases that preserve the key policy improvement property of the standard rollout method. We discuss the limitations of rollout-based approaches without perfect communication. We provide several approximate multi-agent rollout algorithms and their analysis that embody the performance improvement property without perfect communication. We propose policies robust to large fluctuations of system models via an online play algorithm that adaptively chooses and improves the performance of an offline-trained policy. Our proposed policies can scale to large environments with numerous agents, addressing computational bottlenecks. We propose a two-phase algorithm to approximate the performance of and reduce the computational costs of a near-optimal rollout policy. We provide theoretical results to characterize the number of agents that are necessary and sufficient to maintain the stability of the learned policy. Our numerical results show that our approach achieves stability for a team size that satisfies the theoretical conditions.
To validate the performance of our proposed approach, we devise a Reinforcement Learning-based routing in a practical robotics problem, where robots need to maximize chances of rendezvous with sperm whales using in-situ sensor observations. We address several additional challenges of this problem, including critical time windows of rendezvous opportunities, unpredictable whale surfacing schedules, and partial observation owing to the sparse and noisy sensor error with our new belief formulation, future uncertainty sampling, and cost function approximations. As preliminary work, we leverage sperm whales' behavior data collected by the biology team and validate the approaches using post-processed data collected using in-situ acoustic and radio signal-bearing measurements of sperm whales in the Caribbean Sea over three expeditions.
The objective of the thesis is to bring multi-robot sequential decision-making algorithms closer to the real world. With our proposed algorithms that have been extensively validated via numerical simulations, using datasets and in-situ sensory observations for robotics applications, we envision deploying end-to-end autonomous control for multiple autonomous robots in real-time.
Description
Other Available Sources
Keywords
Autonomous sequential decision-making, Model-based reinforcement learning, Multi-robot systems, Computer science, Robotics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service