Publication: Novel Analytic Methods for Electronic Health Record and Clinical Trial Data
No Thumbnail Available
Date
2022-09-12
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
NOGUES, Isabelle-Emmanuella. 2022. Novel Analytic Methods for Electronic Health Record and Clinical Trial Data. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
This dissertation presents novel analytic methods for conducting biomedical studies with Clinical Trial and Electronic Health Record Data. The methods borrow various techniques from the fields of survival analysis, machine learning, and (weakly) semi-supervised learning.
Chapter 1 illustrates a non-parametric survival analysis method applicable in the context of a disease characterized by recurrent events and one terminal event. This consists in estimating the difference in areas under the mean cumulative event function curves for two patient groups. In a (randomized) clinical trial setting, comparing the placebo and active treatment groups, the resulting quantity may serve as a measure of treatment effect. We prove that this estimator satisfies the desired properties of asymptotic normality and efficiency, yielding improvement over standard empirical and bootstrap estimators in small sample settings.
Chapters 2 and 3 focus on modern machine learning techniques adapted to the Electronic Health Record Setting. In Chapter 2, we develop a weakly semi-supervised deep learning method for patient-level phenotyping in EHRs. The algorithm leverages the large number of unlabeled samples to further inform predictions based on a small number of labeled samples. In particular, it extracts valuable information from the high-dimensional EHR features, without requiring feature extraction or manual feature engineering.
Chapter 3 describes a novel semi-supervised risk prediction deep learning algorithm which leverages longitudinal EHR data in a survival analysis context. More specifically, our proposed algorithm can successfully predict the cumulative incidence function of an event time of interest in a large dataset with few current disease status annotations. It uses patient follow-up information and baseline clinical data collected over an observation window to learn current disease status probabilities in a semi-supervised fashion.
Both EHR methods yield high accuracy results despite using very small numbers of labels, notably outperforming existing algorithms in the EHR phenotyping and risk prediction literature.
Description
Other Available Sources
Keywords
clinical trials, Electronic Health Records, non-parametric method, phenotyping, risk prediction, semi-supervised learning, Biostatistics, Bioinformatics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service