Publication: Learning and Evaluating Algorithmic Policies: Methodology and Applications
Loading...
Open/View Files
Date
2025-05-15
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Jia, Zeyang. 2025. Learning and Evaluating Algorithmic Policies: Methodology and Applications. Doctoral Dissertation, Harvard University Graduate School of Arts and Sciences.
Abstract
Algorithmic decisions and policies are increasingly used in today's world, including online advertising, public policy, and medicine. While many methods have been developed to efficiently learn a policy from data, there are also practical demands for statistical guarantees on the learned policy, especially in high stake decision making problems. My research aims to develop new methods that can learn a policy and evaluates the learned policy simultaneously, both in Bayesian way (Chapter 1) and frequentist way (Chapter 2, 3).
Chapter 1: Bayesian Safe Policy Learning with Risk Constraints Policy learning with safety guarantee is essential in many high-stake decision making problems. In this chapter, we introduce a Bayesian policy learning framework that maximizes posterior expected utility while controls the Posterior Average Conditional Risk (ACRisk), ensuring that a newly learned policy does not lead to worse outcomes than the existing baseline policy up to a tolerance level. We also demonstrate it by applying it to learn a new policy for assigning military security score during the Vietnam War using historical data.
Chapter 2: The Cram Method for Simultaneous Learning and Evaluation Evaluating a learned policy after training is a critical step before deploying it in practice. To achieve this, sample splitting methods are statistically inefficient, while the resampling based methods are computationally expensive and evaluates an average of multiple policies rather than an exact policy. In this Chapter, we introduce the cram method, a general framework for simultaneous policy learning and evaluation applicable to general policy learning algorithms. We established theoretical guarantees for the crammed evaluation estimator and demonstrate its effectiveness in both simulation studies and a practical application.
Chapter 3: Cramming Contextual Bandits for On-Policy Evaluation As adaptive learning algorithms are increasingly used in practice, evaluating the learned policy is essential for understanding its performance. In this chapter, we extend the cram method to the evaluation of policies learned by multi-armed contextual bandit algorithms, providing an on-policy alternative to off-policy evaluation methods with more efficiency. We prove theoretical guarantees for the crammed estimator under a stability condition, and validate its effectiveness for linear bandit algorithms, including $\epsilon$-greedy, Thompson Sampling, and Upper Confidence Bound. Empirical results confirm that cramming significantly reduces evaluation error compared to off-policy evaluation methods.
Description
Other Available Sources
Research Data
Keywords
Causal Inference, Cross Validation, Policy Evaluation, Policy Learning, Statistics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service