Publication:

Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data

Loading...
Thumbnail Image

Open/View Files

Date

2017

Journal Title

Journal ISSN

Volume Title

Publisher

BioMed Central
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Paulson, Joseph N., Cho-Yi Chen, Camila M. Lopes-Ramos, Marieke L. Kuijjer, John Platig, Abhijeet R. Sonawane, Maud Fagny, Kimberly Glass, and John Quackenbush. 2017. “Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data.” BMC Bioinformatics 18 (1): 437. doi:10.1186/s12859-017-1847-x. http://dx.doi.org/10.1186/s12859-017-1847-x.

Abstract

Background: Although ultrahigh-throughput RNA-Sequencing has become the dominant technology for genome-wide transcriptional profiling, the vast majority of RNA-Seq studies typically profile only tens of samples, and most analytical pipelines are optimized for these smaller studies. However, projects are generating ever-larger data sets comprising RNA-Seq data from hundreds or thousands of samples, often collected at multiple centers and from diverse tissues. These complex data sets present significant analytical challenges due to batch and tissue effects, but provide the opportunity to revisit the assumptions and methods that we use to preprocess, normalize, and filter RNA-Seq data – critical first steps for any subsequent analysis. Results: We find that analysis of large RNA-Seq data sets requires both careful quality control and the need to account for sparsity due to the heterogeneity intrinsic in multi-group studies. We developed Yet Another RNA Normalization software pipeline (YARN), that includes quality control and preprocessing, gene filtering, and normalization steps designed to facilitate downstream analysis of large, heterogeneous RNA-Seq data sets and we demonstrate its use with data from the Genotype-Tissue Expression (GTEx) project. Conclusions: An R package instantiating YARN is available at http://bioconductor.org/packages/yarn. Electronic supplementary material The online version of this article (10.1186/s12859-017-1847-x) contains supplementary material, which is available to authorized users.

Description

Research Data

Keywords

GTEx, RNA-Seq, Quality control, Filtering, Preprocessing, Normalization

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories