Publication:
A Large Scale, Cloud-Based, Low Cost and Reproducible Mutation Calling Pipeline Using Docker Containers.

No Thumbnail Available

Date

2020-03-03

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Namai, Noel. 2019. A Large Scale, Cloud-Based, Low Cost and Reproducible Mutation Calling Pipeline Using Docker Containers.. Master's thesis, Harvard Extension School.

Research Data

Abstract

The identification of variants that occur in the human genome remains a critical step in the analysis of Next Generation Sequencing (NGS) data. Accurate and timely variant calling is essential to precision medicine which seeks to relate particular genomic variations in a patient’s genome to targetable genes, drugs, and treatments tailored to each patient. However, the analysis of these low-frequency variants requires sensitive algorithms within computational intensive pipelines. With high costs and lack of technology, most institutions resort to running bioinformatics pipelines as bash scripts on local clusters which are not only slow and cumbersome but also challenging to implement. This has in part led to reducing reproducibility and exhaustion of local data storage at such institutions. Therefore, I demonstrate how to implement bioinformatics pipelines in the cloud using open source tools like Github, Docker and Broad Institute Genome Analysis Toolkit (GATK). This fast and low-cost implementation leverages parallel execution and auto scaling within the cloud to handle the computationally intensive pipeline. I show that such pipelines produce accurate results by analyzing data from the 1000 Genomes Project. This would allow researchers to use more of their time doing research and less time configuring workflows.

Description

Other Available Sources

Keywords

Docker, Workflow, NGS, GATK, Bioinformatics, AWS

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories