Publication:

The Point That Makes a Difference: Interpreting Influence in Linear Off-Policy Evaluation

Loading...
Thumbnail Image

Date

2025-05-16

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Zhao, Julianna. 2025. The Point That Makes a Difference: Interpreting Influence in Linear Off-Policy Evaluation. Bachelors Thesis, Harvard University Engineering and Applied Sciences.

Abstract

Evaluating how a decision-making policy will perform without actually running it is a core challenge in reinforcement learning, especially when working with pre-collected data. This task, known as offline policy evaluation (OPE), becomes particularly difficult when the data doesn’t fully cover the situations the new policy might encounter, making certain datapoints disproportionately important. This thesis explores how individual datapoints influence value estimates under two value-based OPE algorithms—Least-Squares Temporal Difference (LSTD) and Fitted Q-Evaluation (FQE)—using linear function approximation. We derive exact leave-one-out (LOO) estimates for both methods and, through experiments on structured environments, show how these influence measures can be used to understand convergence issues and offer insight to the reliability of OPE estimates. In particular, we show that FQE convergence provides a strong signal for when LSTD estimates can be trusted and offer an interpretation through influences for cases of FQE divergence. We further introduce a clipped version of FQE that bounds predicted values, reducing variance while maintaining stability, and provide exact LOO estimates for it as well.

Description

Other Available Sources

Research Data

Keywords

Computer science, Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories