Person: Hawkins, Jared
Email Address
AA Acceptance Date
Birth Date
Research Projects
Organizational Units
Job Title
Last Name
First Name
Name
Search Results
Publication COSMOS: Python library for massively parallel workflows
(Oxford University Press, 2014) Gafni, Erik; Luquette, Joe; Lancaster, Alex K.; Hawkins, Jared; Jung, Jae-Yoon; Souilmi, Yassine; Wall, Dennis P.; Tonellato, PeterSummary: Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. Availability and implementation: Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. Contact: dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Publication Scalable and cost-effective NGS genotyping in the cloud
(BioMed Central, 2015) Souilmi, Yassine; Lancaster, Alex K.; Jung, Jae-Yoon; Rizzo, Ettore; Hawkins, Jared; Powles, Ryan; Amzazi, Saaïd; Ghazal, Hassan; Tonellato, Peter; Wall, Dennis P.Background: While next-generation sequencing (NGS) costs have plummeted in recent years, cost and complexity of computation remain substantial barriers to the use of NGS in routine clinical care. The clinical potential of NGS will not be realized until robust and routine whole genome sequencing data can be accurately rendered to medically actionable reports within a time window of hours and at scales of economy in the 10’s of dollars. Results: We take a step towards addressing this challenge, by using COSMOS, a cloud-enabled workflow management system, to develop GenomeKey, an NGS whole genome analysis workflow. COSMOS implements complex workflows making optimal use of high-performance compute clusters. Here we show that the Amazon Web Service (AWS) implementation of GenomeKey via COSMOS provides a fast, scalable, and cost-effective analysis of both public benchmarking and large-scale heterogeneous clinical NGS datasets. Conclusions: Our systematic benchmarking reveals important new insights and considerations to produce clinical turn-around of whole genome analysis optimization and workflow management including strategic batching of individual genomes and efficient cluster resource configuration. Electronic supplementary material The online version of this article (doi:10.1186/s12920-015-0134-9) contains supplementary material, which is available to authorized users.