Publication: Deployable Online Reinforcement Learning Algorithms
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Online reinforcement learning (RL) algorithms are being increasingly used in real-world set- tings where dynamic environments may render offline algorithms ineffective. Such algorithms are desirable in this setting because they learn and improve future decision-making using continually collected data. Applications include robotics, recommender systems, fine-tuning large language models, and digital health. However there are many constraints to deploying online RL algorithms in the real world. Common challenges include limited data (sparse, partially-observable, etc.), ac- counting for the complexity of the environment, ensuring stability and autonomy of the algorithm, and facilitating intepretability and explainability of the algorithm. In this thesis, we have the use- inspired goal of making online RL deployable and stable in real-world settings. To do so, we provide a full end-to-end pipeline for online RL deployment. We start with guidelines for making various design decisions for the algorithm before deployment. We highlight the reward design as one of the most important design decisions. Next, we provide a framework for creating a monitoring sys- tem to ensure the algorithm runs stably and autonomously during deployment. Then, we cover post-deployment analyses that can be conducted to (1) explain what the algorithm learning and (2) re-evaluate algorithm design for the next deployment. To make ideas concrete in the previous three stages, we use real examples from the online RL algorithm deployed in the Oralytics clinical trial. Finally, we study a theoretical non-stationary bandit problem inspired by the non-stationarity in many real world problems. We conclude by discussing various open research problems for online reinforcement learning deployment.