Publication:
A Bayesian Nonparametric Approach to Multi-Task Learning for Contextual Bandits in Mobile Health

No Thumbnail Available

Date

2022-06-03

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Chhabria, Prasidh Hundraj. 2022. A Bayesian Nonparametric Approach to Multi-Task Learning for Contextual Bandits in Mobile Health. Bachelor's thesis, Harvard College.

Research Data

Abstract

Reinforcement learning algorithms have found utility in a number of digital interventions, such as in mobile health (mHealth), wherein N users are followed and sequentially treated over T timesteps with the aim of optimizing a health outcome. In the precision medicine paradigm, the aim of the intervention (algorithm) is to learn a personalized, optimal treatment policy for each of the N users that optimizes the health outcome of interest. Learning such a policy can be prohibitively slow due to sparse and noisy data for each user; in practice, users can disengage from the intervention before a good policy is learned. To address this problem, we aim to pool information across users to speed learning while preserving individualized treatment. However, pooling data across dissimilar users can lead to disastrous treatment decisions and outcomes. These challenges underscore the need for rigorous, model-based approaches to defining similarity across users before pooling. We model the N-user mHealth setting with a contextual bandit environment and formalize similarity across users using a Dirichlet Process mixture model. We offer a variant of blocked Gibbs sampling to infer clusters among users; we further propose DPMM-Pooling, an integrated intervention algorithm to learn clusters among users and share data within clusters, in order to speed the learning of optimal and individualized treatment policies. We evaluate DPMM-Pooling in simulated mHealth settings, across a range of parameters such as the number of ground-truth clusters, noise in observed outcomes, and the time of pooling. We find and analyze key bias-variance tradeoffs in pooling pertaining to parameters of the environment. We also find that DPMM-Pooling is relatively robust to likely forms of mild and extreme model misspecification. Finally, we outline the implications of our results for the design of pooling-based mHealth interventions in practice.

Description

Other Available Sources

Keywords

Bayesian nonparametrics, Contextual bandits, Mobile health, Reinforcement learning, Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories