Publication: A Large Scale, Cloud-Based, Low Cost and Reproducible Mutation Calling Pipeline Using Docker Containers.
No Thumbnail Available
Open/View Files
Date
2020-03-03
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Namai, Noel. 2019. A Large Scale, Cloud-Based, Low Cost and Reproducible Mutation Calling Pipeline Using Docker Containers.. Master's thesis, Harvard Extension School.
Research Data
Abstract
The identification of variants that occur in the human genome remains a critical step in the analysis of Next Generation Sequencing (NGS) data. Accurate and timely variant calling is essential to precision medicine which seeks to relate particular genomic variations in a patient’s genome to targetable genes, drugs, and treatments tailored to each patient.
However, the analysis of these low-frequency variants requires sensitive algorithms within computational intensive pipelines. With high costs and lack of technology, most institutions resort to running bioinformatics pipelines as bash scripts on local clusters which are not only slow and cumbersome but also challenging to implement. This has in part led to reducing reproducibility and exhaustion of local data storage at such institutions.
Therefore, I demonstrate how to implement bioinformatics pipelines in the cloud using open source tools like Github, Docker and Broad Institute Genome Analysis Toolkit (GATK). This fast and low-cost implementation leverages parallel execution and auto scaling within the cloud to handle the computationally intensive pipeline. I show that such pipelines produce accurate results by analyzing data from the 1000 Genomes Project. This would allow researchers to use more of their time doing research and less time configuring workflows.
Description
Other Available Sources
Keywords
Docker, Workflow, NGS, GATK, Bioinformatics, AWS
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service