CTAT Mutations: A Machine Learning Based RNA-Seq Variant Calling Pipeline Incorporating Variant Annotation, Prioritization, and Visualization
Access StatusFull text of the requested work is not available in DASH at this time ("dark deposit"). For more information on dark deposits, see our FAQ.
Fangal, Vrushali Dipak
MetadataShow full item record
CitationFangal, Vrushali Dipak. 2020. CTAT Mutations: A Machine Learning Based RNA-Seq Variant Calling Pipeline Incorporating Variant Annotation, Prioritization, and Visualization. Master's thesis, Harvard Extension School.
AbstractCancer is a complex multi-factorial disease attributed to accumulation of diverse genetic variations that disrupt the genomic integrity. With the advent of genetic diagnostics in personalized medicine, gene panels have dramatically catapulted the diagnostic yield in cancer. While RNA seq provides a cost effective way of producing high throughput data, the clinical application of single nucleotide polymorphism (SNP) arrays is limited by the high false positive load concomitant with the variant detection pipelines. Here, we describe a robust end to end GATK based Trinity Cancer Transcriptome Analysis Toolkit (CTAT) Mutations Pipeline that leverages a rich set of variant feature annotations with a collection of modern machine learning models to predict genetic variants from RNA seq and reduce the burden of false positives. We demonstrate improved accuracy of our RNA seq based variant prediction pipeline using the Genome in a Bottle (GIAB) reference data and RNA seq and matched whole exome sequencing data from tumor cell lines. Cancer relevant candidate somatic mutations are further selected based on feature annotations and reported in an interactive web application. As RNA seq becomes more widespread in use for clinical diagnostics, we expect our CTAT variant detection pipeline to facilitate use of tumor RNA seq in precision medicine.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37365605