Publication:
Streaming Support for Data Intensive Cloud-Based Sequence Analysis

Thumbnail Image

Open/View Files

Date

2013

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

Hindawi Publishing Corporation
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Issa, Shadi A., Romeo Kienzler, Mohamed El-Kalioby, Peter J. Tonellato, Dennis Wall, Rémy Bruggmann, and Mohamed Abouelhoda. 2013. Streaming support for data intensive cloud-based sequence analysis. BioMed Research International 2013:791051.

Research Data

Abstract

Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

Description

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories