Publication: Latent Computable Phenotyping for Clinically Meaningful Subgroups
No Thumbnail Available
Open/View Files
Date
2024-11-19
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Argaw, Peniel. 2024. Latent Computable Phenotyping for Clinically Meaningful Subgroups. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
Heterogeneity is evident in healthcare. There are variations in treatment response and disease progression, which stem from genetics, clinical care, demographics, and the environment. Precision medicine is crucial in order to improve patient care and outcomes. Rather than employing a one-size-fits-all approach, precision medicine aims to tailor medical interventions to the specific characteristics of a patient or subgroup. Unsupervised machine learning has the potential to unveil latent patterns in data with profound clinical implications. We refer to latent computable phenotypes (LCP) to explain the subgroups identified through unsupervised partitioning methods, which reveal characteristics that are not immediately or easily observable in a population. This research aims to identify LCPs by addressing three important questions: 1) How to select and represent patient data to better address the task at hand, 2) How to partition and define distinct subgroups (LCPs), and 3) How to interpret and evaluate the subgroups?
We addressed these questions across various biomedical research tasks and datasets that described patient clinical histories through electronic health records (EHR), randomized control trials (RCT), and medical insurance claims. We began with a heterogeneous disease scenario and anomaly detection to characterize anomalies in a population. Given an insurance claims dataset, we defined preprocessing heuristics to select a cohort with similar clinical trajectories and evaluated the clinical differences and implications of the typical versus anomalous cohorts. Next, we explored recursive partitioning to identify multiple subgroups that are optimized to maximize the homogeneity within the groups and heterogeneity across the groups. Specifically, we evaluated heterogeneous treatment effects in synthetic and semi-synthetic RCT data. Finally, we assessed the stability and generalizability of clustering in EHRs. Notably, we examined the effects of data size and representation on the emergent clusters and evaluated the structure of clusters as more data was provided into a representation learning model.
Understanding the underlying heterogeneities within a patient population through LCPs is beneficial for designing preventative and treatment strategies, interpreting retrospective analyses, and enhancing understanding of complex diseases.
Description
Other Available Sources
Keywords
Anomaly detection, Clinical support tools, Clustering, Computable phenotyping, Heterogeneous treatment effects, Unsupervised learning, Artificial intelligence, Computer science, Bioinformatics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service