Publication:

New approaches to factual and counterfactual prediction modeling

Loading...
Thumbnail Image

Date

2023-01-23

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Boyer, Christopher Brian. 2023. New approaches to factual and counterfactual prediction modeling. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Over the past half century, new methods for quantitative risk prediction and validation were formalized and the number of models, both statistical and algorithmic, increased exponentially. However, this literature has largely focused on descriptive predictions of the world as it is, what I term factual prediction, instead of the world as it would be if we intervened, or counterfactual prediction. In this dissertation, I argue that in many instances counterfactual predictions are desired, but targeting them requires new methods based on causal inference.

In Chapter 1, I take a method traditionally associated with causal inference, the g-formula, and repurpose it as a model for factual and counterfactual prediction. In doing so, I highlight the potential of the g-formula as unifying framework for prediction as well as the assumptions required. Through simulation and an applied data example in the Framingham Offspring Study, I show how the g-formula can estimate factual and counterfactual quantities and leverage multiple repeated measurements over time to produce predictions that update dynamically.

In Chapter 2, I consider an example of a common clinical prediction task, i.e. developing a model for risk-based treatment decisions, where the ideal target is counterfactual. Building on prior work, I clarify the single-arm target trial of interest and propose two estimation methods that allow for separation between the causal and prediction tasks. I apply these methods to predict the statin-naive risk of cardiovascular disease using an emulated trial based on the Multi-Ethnic Study of Atherosclerosis. I find that traditional methods lead to underallocation of treatment at common thresholds of between 5 and 9 percentage points.

Finally, in Chapter 3, I tackle the theoretical question of how to train and validate models for counterfactual prediction when the relevant potential outcomes are not observed for all units. I discuss how to tailor a model for use in the same population under a counterfactual shift in treatment policy, how to assess its performance, and how to perform model and tuning parameter selection. I also provide identifiability results for measures of counterfactual performance for a potentially misspecified prediction model. I illustrate the methods using simulation and apply them to validate the performance of the statin-naive risk prediction model from Chapter 2.

Description

Other Available Sources

Research Data

Keywords

Epidemiology

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories