Publication:
Scalable and cost-effective NGS genotyping in the cloud

Thumbnail Image

Open/View Files

Date

2015

Journal Title

Journal ISSN

Volume Title

Publisher

BioMed Central
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Souilmi, Yassine, Alex K. Lancaster, Jae-Yoon Jung, Ettore Rizzo, Jared B. Hawkins, Ryan Powles, Saaïd Amzazi, Hassan Ghazal, Peter J. Tonellato, and Dennis P. Wall. 2015. “Scalable and cost-effective NGS genotyping in the cloud.” BMC Medical Genomics 8 (1): 64. doi:10.1186/s12920-015-0134-9. http://dx.doi.org/10.1186/s12920-015-0134-9.

Research Data

Abstract

Background: While next-generation sequencing (NGS) costs have plummeted in recent years, cost and complexity of computation remain substantial barriers to the use of NGS in routine clinical care. The clinical potential of NGS will not be realized until robust and routine whole genome sequencing data can be accurately rendered to medically actionable reports within a time window of hours and at scales of economy in the 10’s of dollars. Results: We take a step towards addressing this challenge, by using COSMOS, a cloud-enabled workflow management system, to develop GenomeKey, an NGS whole genome analysis workflow. COSMOS implements complex workflows making optimal use of high-performance compute clusters. Here we show that the Amazon Web Service (AWS) implementation of GenomeKey via COSMOS provides a fast, scalable, and cost-effective analysis of both public benchmarking and large-scale heterogeneous clinical NGS datasets. Conclusions: Our systematic benchmarking reveals important new insights and considerations to produce clinical turn-around of whole genome analysis optimization and workflow management including strategic batching of individual genomes and efficient cluster resource configuration. Electronic supplementary material The online version of this article (doi:10.1186/s12920-015-0134-9) contains supplementary material, which is available to authorized users.

Description

Keywords

Next-generation sequencing, Clinical sequencing, Cloud computing, Medical genomics, Software, Bioinformatics, Parallel computing

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories