Publication:
Novel Analytic Methods for Electronic Health Record and Clinical Trial Data

No Thumbnail Available

Date

2022-09-12

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

NOGUES, Isabelle-Emmanuella. 2022. Novel Analytic Methods for Electronic Health Record and Clinical Trial Data. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

This dissertation presents novel analytic methods for conducting biomedical studies with Clinical Trial and Electronic Health Record Data. The methods borrow various techniques from the fields of survival analysis, machine learning, and (weakly) semi-supervised learning. Chapter 1 illustrates a non-parametric survival analysis method applicable in the context of a disease characterized by recurrent events and one terminal event. This consists in estimating the difference in areas under the mean cumulative event function curves for two patient groups. In a (randomized) clinical trial setting, comparing the placebo and active treatment groups, the resulting quantity may serve as a measure of treatment effect. We prove that this estimator satisfies the desired properties of asymptotic normality and efficiency, yielding improvement over standard empirical and bootstrap estimators in small sample settings. Chapters 2 and 3 focus on modern machine learning techniques adapted to the Electronic Health Record Setting. In Chapter 2, we develop a weakly semi-supervised deep learning method for patient-level phenotyping in EHRs. The algorithm leverages the large number of unlabeled samples to further inform predictions based on a small number of labeled samples. In particular, it extracts valuable information from the high-dimensional EHR features, without requiring feature extraction or manual feature engineering. Chapter 3 describes a novel semi-supervised risk prediction deep learning algorithm which leverages longitudinal EHR data in a survival analysis context. More specifically, our proposed algorithm can successfully predict the cumulative incidence function of an event time of interest in a large dataset with few current disease status annotations. It uses patient follow-up information and baseline clinical data collected over an observation window to learn current disease status probabilities in a semi-supervised fashion. Both EHR methods yield high accuracy results despite using very small numbers of labels, notably outperforming existing algorithms in the EHR phenotyping and risk prediction literature.

Description

Other Available Sources

Keywords

clinical trials, Electronic Health Records, non-parametric method, phenotyping, risk prediction, semi-supervised learning, Biostatistics, Bioinformatics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories