CTAT Mutations: A Machine Learning Based RNA-Seq Variant Calling Pipeline Incorporating Variant Annotation, Prioritization, and Visualization
Citation
Fangal, Vrushali Dipak. 2020. CTAT Mutations: A Machine Learning Based RNA-Seq Variant Calling Pipeline Incorporating Variant Annotation, Prioritization, and Visualization. Master's thesis, Harvard Extension School.Abstract
Cancer is a complex multi-factorial disease attributed to accumulation of diverse genetic variations that disrupt the genomic integrity. With the advent of genetic diagnostics in personalized medicine, gene panels have dramatically catapulted the diagnostic yield in cancer. While RNA seq provides a cost effective way of producing high throughput data, the clinical application of single nucleotide polymorphism (SNP) arrays is limited by the high false positive load concomitant with the variant detection pipelines. Here, we describe a robust end to end GATK based Trinity Cancer Transcriptome Analysis Toolkit (CTAT) Mutations Pipeline that leverages a rich set of variant feature annotations with a collection of modern machine learning models to predict genetic variants from RNA seq and reduce the burden of false positives. We demonstrate improved accuracy of our RNA seq based variant prediction pipeline using the Genome in a Bottle (GIAB) reference data and RNA seq and matched whole exome sequencing data from tumor cell lines. Cancer relevant candidate somatic mutations are further selected based on feature annotations and reported in an interactive web application. As RNA seq becomes more widespread in use for clinical diagnostics, we expect our CTAT variant detection pipeline to facilitate use of tumor RNA seq in precision medicine.Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37365605
Collections
- DCE Theses and Dissertations [1264]
Contact administrator regarding this item (to report mistakes or request changes)