Person:
Tonellato, Peter

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Tonellato

First Name

Peter

Name

Tonellato, Peter

Search Results

Now showing 1 - 10 of 15
  • Thumbnail Image
    Publication
    Scalable and cost-effective NGS genotyping in the cloud
    (BioMed Central, 2015) Souilmi, Yassine; Lancaster, Alex K.; Jung, Jae-Yoon; Rizzo, Ettore; Hawkins, Jared; Powles, Ryan; Amzazi, Saaïd; Ghazal, Hassan; Tonellato, Peter; Wall, Dennis P.
    Background: While next-generation sequencing (NGS) costs have plummeted in recent years, cost and complexity of computation remain substantial barriers to the use of NGS in routine clinical care. The clinical potential of NGS will not be realized until robust and routine whole genome sequencing data can be accurately rendered to medically actionable reports within a time window of hours and at scales of economy in the 10’s of dollars. Results: We take a step towards addressing this challenge, by using COSMOS, a cloud-enabled workflow management system, to develop GenomeKey, an NGS whole genome analysis workflow. COSMOS implements complex workflows making optimal use of high-performance compute clusters. Here we show that the Amazon Web Service (AWS) implementation of GenomeKey via COSMOS provides a fast, scalable, and cost-effective analysis of both public benchmarking and large-scale heterogeneous clinical NGS datasets. Conclusions: Our systematic benchmarking reveals important new insights and considerations to produce clinical turn-around of whole genome analysis optimization and workflow management including strategic batching of individual genomes and efficient cluster resource configuration. Electronic supplementary material The online version of this article (doi:10.1186/s12920-015-0134-9) contains supplementary material, which is available to authorized users.
  • Thumbnail Image
    Publication
    Analysis of sequence-based copy number variation detection tools for cancer studies
    (American Medical Informatics Association, 2013) Nabavi, Sheida; Cai, Zhengqiu; Tonellato, Peter
  • Thumbnail Image
    Publication
    Histopathologic Alterations Associated with Global Gene Expression Due to Chronic Dietary TCDD Exposure in Juvenile Zebrafish
    (Public Library of Science, 2014) Liu, Qing; Spitsbergen, Jan M.; Cariou, Ronan; Huang, Chun-Yuan; Jiang, Nan; Goetz, Giles; Hutz, Reinhold J.; Tonellato, Peter; Carvan, Michael J.
    The goal of this project was to investigate the effects and possible developmental disease implication of chronic dietary TCDD exposure on global gene expression anchored to histopathologic analysis in juvenile zebrafish by functional genomic, histopathologic and analytic chemistry methods. Specifically, juvenile zebrafish were fed Biodiet starter with TCDD added at 0, 0.1, 1, 10 and 100 ppb, and fish were sampled following 0, 7, 14, 28 and 42 d after initiation of the exposure. TCDD accumulated in a dose- and time-dependent manner and 100 ppb TCDD caused TCDD accumulation in female (15.49 ppb) and male (18.04 ppb) fish at 28 d post exposure. Dietary TCDD caused multiple lesions in liver, kidney, intestine and ovary of zebrafish and functional dysregulation such as depletion of glycogen in liver, retrobulbar edema, degeneration of nasal neurosensory epithelium, underdevelopment of intestine, and diminution in the fraction of ovarian follicles containing vitellogenic oocytes. Importantly, lesions in nasal epithelium and evidence of endocrine disruption based on alternatively spliced vasa transcripts are two novel and significant results of this study. Microarray gene expression analysis comparing vehicle control to dietary TCDD revealed dysregulated genes involved in pathways associated with cardiac necrosis/cell death, cardiac fibrosis, renal necrosis/cell death and liver necrosis/cell death. These baseline toxicological effects provide evidence for the potential mechanisms of developmental dysfunctions induced by TCDD and vasa as a biomarker for ovarian developmental disruption.
  • Thumbnail Image
    Publication
    COSMOS: Python library for massively parallel workflows
    (Oxford University Press, 2014) Gafni, Erik; Luquette, Joe; Lancaster, Alex K.; Hawkins, Jared; Jung, Jae-Yoon; Souilmi, Yassine; Wall, Dennis P.; Tonellato, Peter
    Summary: Efficient workflows to shepherd clinically generated genomic data through the multiple stages of a next-generation sequencing pipeline are of critical importance in translational biomedical science. Here we present COSMOS, a Python library for workflow management that allows formal description of pipelines and partitioning of jobs. In addition, it includes a user interface for tracking the progress of jobs, abstraction of the queuing system and fine-grained control over the workflow. Workflows can be created on traditional computing clusters as well as cloud-based services. Availability and implementation: Source code is available for academic non-commercial research purposes. Links to code and documentation are provided at http://lpm.hms.harvard.edu and http://wall-lab.stanford.edu. Contact: dpwall@stanford.edu or peter_tonellato@hms.harvard.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
  • Thumbnail Image
    Publication
    RNA-Seq of the Caribbean reef-building coral Orbicella faveolata (Scleractinia-Merulinidae) under bleaching and disease stress expands models of coral innate immunity
    (PeerJ Inc., 2016) Anderson, David A.; Walz, Marcus E.; Weil, Ernesto; Tonellato, Peter; Smith, Matthew C.
    Climate change-driven coral disease outbreaks have led to widespread declines in coral populations. Early work on coral genomics established that corals have a complex innate immune system, and whole-transcriptome gene expression studies have revealed mechanisms by which the coral immune system responds to stress and disease. The present investigation expands bioinformatic data available to study coral molecular physiology through the assembly and annotation of a reference transcriptome of the Caribbean reef-building coral, Orbicella faveolata. Samples were collected during a warm water thermal anomaly, coral bleaching event and Caribbean yellow band disease outbreak in 2010 in Puerto Rico. Multiplex sequencing of RNA on the Illumina GAIIx platform and de novo transcriptome assembly by Trinity produced 70,745,177 raw short-sequence reads and 32,463 O. faveolata transcripts, respectively. The reference transcriptome was annotated with gene ontologies, mapped to KEGG pathways, and a predicted proteome of 20,488 sequences was generated. Protein families and signaling pathways that are essential in the regulation of innate immunity across Phyla were investigated in-depth. Results were used to develop models of evolutionarily conserved Wnt, Notch, Rig-like receptor, Nod-like receptor, and Dicer signaling. O. faveolata is a coral species that has been studied widely under climate-driven stress and disease, and the present investigation provides new data on the genes that putatively regulate its immune system.
  • Thumbnail Image
    Publication
    COSMOS: cloud enabled NGS analysis
    (BioMed Central, 2015) Souilmi, Yassine; Jung, Jae-Yoon; Lancaster, Alex; Gafni, Erik; Amzazi, Saaid; Ghazal, Hassan; Wall, Dennis; Tonellato, Peter
  • Thumbnail Image
    Publication
    MC-GenomeKey: a multicloud system for the detection and annotation of genomic variants
    (BioMed Central, 2017) Elshazly, Hatem; Souilmi, Yassine; Tonellato, Peter; Wall, Dennis P.; Abouelhoda, Mohamed
    Background: Next Generation Genome sequencing techniques became affordable for massive sequencing efforts devoted to clinical characterization of human diseases. However, the cost of providing cloud-based data analysis of the mounting datasets remains a concerning bottleneck for providing cost-effective clinical services. To address this computational problem, it is important to optimize the variant analysis workflow and the used analysis tools to reduce the overall computational processing time, and concomitantly reduce the processing cost. Furthermore, it is important to capitalize on the use of the recent development in the cloud computing market, which have witnessed more providers competing in terms of products and prices. Results: In this paper, we present a new package called MC-GenomeKey (Multi-Cloud GenomeKey) that efficiently executes the variant analysis workflow for detecting and annotating mutations using cloud resources from different commercial cloud providers. Our package supports Amazon, Google, and Azure clouds, as well as, any other cloud platform based on OpenStack. Our package allows different scenarios of execution with different levels of sophistication, up to the one where a workflow can be executed using a cluster whose nodes come from different clouds. MC-GenomeKey also supports scenarios to exploit the spot instance model of Amazon in combination with the use of other cloud platforms to provide significant cost reduction. To the best of our knowledge, this is the first solution that optimizes the execution of the workflow using computational resources from different cloud providers. Conclusions: MC-GenomeKey provides an efficient multicloud based solution to detect and annotate mutations. The package can run in different commercial cloud platforms, which enables the user to seize the best offers. The package also provides a reliable means to make use of the low-cost spot instance model of Amazon, as it provides an efficient solution to the sudden termination of spot machines as a result of a sudden price increase. The package has a web-interface and it is available for free for academic use.
  • Thumbnail Image
    Publication
    Personalized cloud-based bioinformatics services for research and education: Use cases and the elasticHPC package
    (BioMed Central, 2012) El-Kalioby, Mohamed; Abouelhoda, Mohamed; Krüger, Jan; Giegerich, Robert; Sczyrba, Alexander; Wall, Dennis Paul; Tonellato, Peter
    Background: Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. Results: In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Conclusions: Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org.
  • Thumbnail Image
    Publication
    Streaming Support for Data Intensive Cloud-Based Sequence Analysis
    (Hindawi Publishing Corporation, 2013) Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter; Wall, Dennis Paul; Bruggmann, Rémy; Abouelhoda, Mohamed
    Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.
  • Thumbnail Image
    Publication
    The future of genomics in pathology
    (Faculty of 1000 Ltd, 2012) Wall, Dennis Paul; Tonellato, Peter
    The recent advances in technology and the promise of cheap and fast whole genomic data offer the possibility to revolutionise the discipline of pathology. This should allow pathologists in the near future to diagnose disease rapidly and early to change its course, and to tailor treatment programs to the individual. This review outlines some of these technical advances and the changes needed to make this revolution a reality.