Publication:
reGenotyper: Detecting mislabeled samples in genetic data

Thumbnail Image

Open/View Files

Date

2017

Journal Title

Journal ISSN

Volume Title

Publisher

Public Library of Science
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Zych, K., B. L. Snoek, M. Elvin, M. Rodriguez, K. J. Van der Velde, D. Arends, H. Westra, et al. 2017. “reGenotyper: Detecting mislabeled samples in genetic data.” PLoS ONE 12 (2): e0171324. doi:10.1371/journal.pone.0171324. http://dx.doi.org/10.1371/journal.pone.0171324.

Research Data

Abstract

In high-throughput molecular profiling studies, genotype labels can be wrongly assigned at various experimental steps; the resulting mislabeled samples seriously reduce the power to detect the genetic basis of phenotypic variation. We have developed an approach to detect potential mislabeling, recover the “ideal” genotype and identify “best-matched” labels for mislabeled samples. On average, we identified 4% of samples as mislabeled in eight published datasets, highlighting the necessity of applying a “data cleaning” step before standard data analysis.

Description

Keywords

Biology and Life Sciences, Computational Biology, Genome Analysis, Genome-Wide Association Studies, Genetics, Genomics, Human Genetics, Evolutionary Biology, Population Genetics, Genetic Polymorphism, Population Biology, Phenotypes, Genetic Loci, Quantitative Trait Loci, Gene Expression, Physical Sciences, Mathematics, Discrete Mathematics, Combinatorics, Permutation, Experimental Organism Systems, Model Organisms, Caenorhabditis Elegans, Animal Models, Organisms, Animals, Invertebrates, Nematoda, Caenorhabditis

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories