Publication:
Robust Predictions With Observational Data

No Thumbnail Available

Date

2020-05-06

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Yuan, William. 2020. Robust Predictions With Observational Data. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Research Data

Abstract

Data science, as currently practiced, is an awkward fit for studying biology or medicine, which currently exist in a state where causal mechanisms to explain many of our observations are often unavailable. While mechanistic deductions are possible in narrow, well defined areas (signaling pathways, binding and protein folding, etc.), a deterministic, internally consistent model of human physiology is still far off. Consequently, the field has developed to serve two purposes simultaneously: both to construct such a framework, but also to help patients in the present with the incomplete information that we have access to. Modern data scientists and researchers utilize massive datasets to attempt to extract insights from a highly complex, largely mysterious system. Given the implications that research recommendations can have on physician behavior, and acknowledged missingness in our understanding, ensuring the reliability and validity of our methods is of paramount importance. The rise of statistical learning and large datasets has led to significant optimism regarding the ability of such models to influence or even make predictions about patient outcomes. However, constructing inductions that can fit into the otherwise deductive medical and scientific frameworks can be a fraught process. I examine how such work can be framed so as to resultant predictive models “useful” to both clinicians and scientists, and suggest methods for this that can exist within existing research frameworks. In particular, I examine three cases in detail. First, I describe the basis and implications of temporal bias for the first time, a flaw present in a ubiquitous study design that prevents reliable predictions of the future. Next, I describe knowledge parasitism, a phenomenon where machine learning models piggyback off of the decisions and expertise of clinicians, making their predictions consequently less likely to extend beyond what a clinician may already suspect. Finally, I describe the tendency for propensity matching to “launder” bias in surgical studies, acting to conceal overlooked biases and introduce new biases, reducing the confidence and applicability of the findings.

Description

Other Available Sources

Keywords

prediction, machine learning, risk stratification, bias

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories