Publication:

Learning to Lead Many: Online Algorithms for Bayesian Stackelberg Games

Loading...
Thumbnail Image

Date

2025-06-24

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Personnat, Gerson Egbert. 2025. Learning to Lead Many: Online Algorithms for Bayesian Stackelberg Games. Bachelors Thesis, Harvard University Engineering and Applied Sciences.

Abstract

A \textit{Stackelberg Game} models a strategic interaction in which a leader takes an action that influences the behavior of the followers. Stackelberg games have been applied in various real-world domains, including security, transportation, computer networks, and supply chains. In many practical settings, the leader often faces uncertainty about key environmental parameters, requiring them to learn and adapt over time. We consider a \textit{Bayesian} Stackelberg Game, where followers have private types unknown to the leader. These types represent the followers' private information. The $n$ followers each have one of $K$ types, which may be either independent or correlated with other followers. The leader's objective is to learn about the environment (i.e. the distribution over follower types) in order to compute an optimal strategy. We formulate this as an \textit{online learning} problem, where the leader repeatedly plays the Stackelberg Game over $T$ rounds, with follower types drawn from an unknown but fixed distribution at each interaction. The leader's goal in this setting is to minimize their \textit{regret}, defined as the cumulative difference between the utility of the optimal fixed strategy and that of the strategy that they choose at each round. This thesis theoretically and empirically analyzes the regret of various learning algorithms for the leader. Under \textit{type feedback}, where the leader observes the followers' types after each round, we design learning algorithms that achieve an \textit{upper bound} of $O\big(\sqrt{\min{nK,~ L\log(nKA T)} \cdot T} \big)$ on expected regret. Under \textit{action} feedback, where the leader only observes the followers' actions, we design algorithms with at most $O( \min(K^n\sqrt{ T } \log T, \sqrt{ n^L K^L A^{2L} T \log T } ) )$ regret. Furthermore, we establish a \textit{lower bound} of $O(\sqrt{nKT})$ on the regret of this learning problem.

Description

Other Available Sources

Research Data

Keywords

Computer science, Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories