Publication:

Regret-Based Algorithms for Multi-Armed Bandits

Loading...
Thumbnail Image

Date

2020-06-17

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Zhao, Kevin Hanbo. 2020. Regret-Based Algorithms for Multi-Armed Bandits. Bachelor's thesis, Harvard College.

Abstract

Multi-armed bandits (MABs) are the single step analogs of the full reinforcement learning problem. They are simple but powerful models for studying the exploration/exploitation dilemma in the context of making sequential decisions over time and under uncertainty. There are several notions of optimality in the literature: asymptotic optimality, regret optimality, and PAC optimality. In this work, we examine the multi-armed bandit problem (which is sometimes called the sequential allocation problem) within the context of regret optimality. We cover several groundbreaking algorithms that approach MABs through the frequentist and Bayesian perspective. \textbf{UCB1} was one of the first algorithms to achieve a logarithmic regret bound and sparked a new field of literature for upper confidence bound based algorithms. \textbf{UCB-V} was one of the first works to improve the regret bound for \textbf{UCB1} but is still not ``optimal''. We later introduce \textbf{KL-UCB}, \textbf{Thompson Sampling}, and \textbf{Bayes UCB}, which are all able to achieve regret optimality asymptotically (in the Bernoulli reward setting). We then perform experiments to illustrate their robustness and how well these algorithms perform under different reward distributions and slight perturbations to their respective assumptions.

Description

Other Available Sources

Research Data

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories