Publication: Deep Learning Models for Variant Pathogenicity Prediction
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Technological advancements in DNA sequencing have made it a mainstay in the clinic in the form of targeted genetic testing as well as whole exome and whole genome sequencing (WES and WGS, respectively). With increased use comes a growing need to interpret the abundance of data being generated and the myriad variants being discov- ered. Many computational methods have been developed to address this issue, with varying levels of success. My goal in this thesis is to build on such methods by altering the underlying the models, the learning algorithms, and the data being used, and then apply them to the task of clinical-grade variant pathogenicity classification. To do so I first review and compare the methods that have been developed so far, trying to identify a common pattern of strengths, weaknesses, and aspects to account for. Then, I reproduce a foundational method developed for the in- terpretation of hypertrophic cardiomyopathy-related disease, PolyPhen-HCM. Finally, using the insights learned from both the comprehensive review and the redesigning and in-depth analysis of PolyPhen-HCM, I introduce deep learning models that address, through their improved architectures and data, some of the most salient issues that methods in variant interpretation have to deal with.