Publication: New methods for representation learning, uncertainty quantification, and causal inference in biomedical machine learning
No Thumbnail Available
Open/View Files
Date
2022-09-07
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Kompa, Benjamin. 2022. New methods for representation learning, uncertainty quantification, and causal inference in biomedical machine learning. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
Machine learning has made tremendous progress in recent years on the basis of large datasets, increased computational capacity, and the unreasonable effectiveness of inductive biases for text and image data. These same inductive biases (e.g. convolutions) have also proved as natural fits across biomedical data from genomes to medical records. However, biomedical applications of machine learning are fraught with additional challenges such as privacy concerns, dataset shift after deployment, and estimation of causal effects in the presence of confounding. In the following dissertation, we develop a model that learns an embedding space of medical concepts across multiple private sources of healthcare data and provide new benchmarks to assess models’ understanding of medical knowledge in the cui2vec R package. We introduce a Unified Feature Disentanglement Network trained on the Cancer Genome Atlas, which can garner insights into key genes in oncological development. Additionally, we examine the coverage properties of popular, approximate Bayesian machine learning models and find that they fail to adequately adjust uncertainty measures under dataset shift. Finally, we present a new approach based on neural networks for estimating causal effects in the presence of unmeasured confounding. Collectively, these methods address core challenges for biomedical applications of machine learning and provide foundations for future research directions.
Description
Other Available Sources
Keywords
Bioinformatics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service