Publication: A Bayesian Nonparametric Approach to Multi-Task Learning for Contextual Bandits in Mobile Health
No Thumbnail Available
Open/View Files
Date
2022-06-03
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Chhabria, Prasidh Hundraj. 2022. A Bayesian Nonparametric Approach to Multi-Task Learning for Contextual Bandits in Mobile Health. Bachelor's thesis, Harvard College.
Research Data
Abstract
Reinforcement learning algorithms have found utility in a number of digital interventions, such as in mobile health (mHealth), wherein N users are followed and sequentially treated over T timesteps with the aim of optimizing a health outcome. In the precision medicine paradigm, the aim of the intervention (algorithm) is to learn a personalized, optimal treatment policy for each of the N users that optimizes the health outcome of interest. Learning such a policy can be prohibitively slow due to sparse and noisy data for each user; in practice, users can disengage from the intervention before a good policy is learned.
To address this problem, we aim to pool information across users to speed learning while preserving individualized treatment. However, pooling data across dissimilar users can lead to disastrous treatment decisions and outcomes. These challenges underscore the need for rigorous, model-based approaches to defining similarity across users before pooling.
We model the N-user mHealth setting with a contextual bandit environment and formalize similarity across users using a Dirichlet Process mixture model. We offer a variant of blocked Gibbs sampling to infer clusters among users; we further propose DPMM-Pooling, an integrated intervention algorithm to learn clusters among users and share data within clusters, in order to speed the learning of optimal and individualized treatment policies.
We evaluate DPMM-Pooling in simulated mHealth settings, across a range of parameters such as the number of ground-truth clusters, noise in observed outcomes, and the time of pooling. We find and analyze key bias-variance tradeoffs in pooling pertaining to parameters of the environment. We also find that DPMM-Pooling is relatively robust to likely forms of mild and extreme model misspecification. Finally, we outline the implications of our results for the design of pooling-based mHealth interventions in practice.
Description
Other Available Sources
Keywords
Bayesian nonparametrics, Contextual bandits, Mobile health, Reinforcement learning, Statistics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service