Publication: Learning Strategies for Bidding in Online Advertisement Auctions With Noisy Feedback
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
We focus on analyzing the best robust online learning algorithms for bidding in repeated Online Advertisement Auctions. The structure of these auctions creates a unique setting where a bidder only receive information on her value if she wins the auction at a specific time step and then she also observes feedback on other bids she could have submitted. We analyze two strategies that can be employed by a bidder in order to maximize her reward while simultaneously learning her value as the auction progresses. First, we consider an online learner using a Multi-Arm Bandit online learning approach. Second, we leverage the feedback structure of Online Advertisement Auctions to provide a theoretical proof that a noisy variant of a partial feedback online learning algorithm (WIN-EXP \cite{winexp}) has regret rates against the best bid in hindsight that converge quickly, grow only logarithmically with the number of actions a learner can take, and scale proportionally to the square root of the magnitude of the noise. Next, we use an experimental setting to exhibit the robustness of the noisy WIN-EXP algorithm and confirm the theoretical results. Finally, we include auction simulations to demonstrate a trade-off between using an online learning method that may incur some bias in the updates (WIN-EXP) versus using a method that is completely clean (Multi-Armed Bandit EXP3), but may not have the best regret outcomes.