Publication: New methods for representation learning, uncertainty quantification, and causal inference in biomedical machine learning
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Machine learning has made tremendous progress in recent years on the basis of large datasets, increased computational capacity, and the unreasonable effectiveness of inductive biases for text and image data. These same inductive biases (e.g. convolutions) have also proved as natural fits across biomedical data from genomes to medical records. However, biomedical applications of machine learning are fraught with additional challenges such as privacy concerns, dataset shift after deployment, and estimation of causal effects in the presence of confounding. In the following dissertation, we develop a model that learns an embedding space of medical concepts across multiple private sources of healthcare data and provide new benchmarks to assess models’ understanding of medical knowledge in the cui2vec R package. We introduce a Unified Feature Disentanglement Network trained on the Cancer Genome Atlas, which can garner insights into key genes in oncological development. Additionally, we examine the coverage properties of popular, approximate Bayesian machine learning models and find that they fail to adequately adjust uncertainty measures under dataset shift. Finally, we present a new approach based on neural networks for estimating causal effects in the presence of unmeasured confounding. Collectively, these methods address core challenges for biomedical applications of machine learning and provide foundations for future research directions.