Publication:
Latent Computable Phenotyping for Clinically Meaningful Subgroups

No Thumbnail Available

Date

2024-11-19

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Argaw, Peniel. 2024. Latent Computable Phenotyping for Clinically Meaningful Subgroups. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

Heterogeneity is evident in healthcare. There are variations in treatment response and disease progression, which stem from genetics, clinical care, demographics, and the environment. Precision medicine is crucial in order to improve patient care and outcomes. Rather than employing a one-size-fits-all approach, precision medicine aims to tailor medical interventions to the specific characteristics of a patient or subgroup. Unsupervised machine learning has the potential to unveil latent patterns in data with profound clinical implications. We refer to latent computable phenotypes (LCP) to explain the subgroups identified through unsupervised partitioning methods, which reveal characteristics that are not immediately or easily observable in a population. This research aims to identify LCPs by addressing three important questions: 1) How to select and represent patient data to better address the task at hand, 2) How to partition and define distinct subgroups (LCPs), and 3) How to interpret and evaluate the subgroups? We addressed these questions across various biomedical research tasks and datasets that described patient clinical histories through electronic health records (EHR), randomized control trials (RCT), and medical insurance claims. We began with a heterogeneous disease scenario and anomaly detection to characterize anomalies in a population. Given an insurance claims dataset, we defined preprocessing heuristics to select a cohort with similar clinical trajectories and evaluated the clinical differences and implications of the typical versus anomalous cohorts. Next, we explored recursive partitioning to identify multiple subgroups that are optimized to maximize the homogeneity within the groups and heterogeneity across the groups. Specifically, we evaluated heterogeneous treatment effects in synthetic and semi-synthetic RCT data. Finally, we assessed the stability and generalizability of clustering in EHRs. Notably, we examined the effects of data size and representation on the emergent clusters and evaluated the structure of clusters as more data was provided into a representation learning model. Understanding the underlying heterogeneities within a patient population through LCPs is beneficial for designing preventative and treatment strategies, interpreting retrospective analyses, and enhancing understanding of complex diseases.

Description

Other Available Sources

Keywords

Anomaly detection, Clinical support tools, Clustering, Computable phenotyping, Heterogeneous treatment effects, Unsupervised learning, Artificial intelligence, Computer science, Bioinformatics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories