The Price of Personalization: An Application of Contextual Bandits to Mobile Health
Abstract
One goal in healthcare is to be able to accurately personalize treatments, including the ability to maintaining overall efficacy in treatment while minimizing the harm and quantity of mistreated patients. With the recent prevalence of mobile devices, rapid collection of data was made possible to leverage in personalizing treatment of long-term diseases, as in the HeartSteps study, an adaptive mHealth (mobile health) intervention application for cardiovascular maintenance.We frame the HeartSteps study as a contextual multi-armed bandit (MAB) problem, a reinforcement learning setting in which the agent must choose the optimal treatment action among several based on contextual information.
We investigate and test the use of several different variants of the Thompson Sampling heuristic, a lightweight but effective reinforcement learning algorithm, to solve the Multi-armed Bandit problem as applied to HeartSteps. Experimental bootstrapping results are interpreted and then used to corroborate theoretically backed modifications to Thompson Sampling, guiding the future design of HeartSteps to maximize overall treatment performance while minimizing variance of per-patient performance. Through these evaluations, we examine the price of personalization, or the trade-off between optimizing the overall treatment efficacy versus optimizing the fairness of individual treatment efficacies.
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
http://nrs.harvard.edu/urn-3:HUL.InstRepos:38811548
Collections
- FAS Theses and Dissertations [5370]
Contact administrator regarding this item (to report mistakes or request changes)