Show simple item record

dc.contributor.authorBlades, Natalie
dc.contributor.authorSultana, Razvan
dc.contributor.authorDing, Jie
dc.contributor.authorParmigiani, Giovanni
dc.contributor.authorWang, Xin Victoria
dc.date.accessioned2013-04-25T18:49:06Z
dc.date.issued2012
dc.identifier.citationVictoria, Xin, Natalie Blades, Jie Ding, Razvan Sultana, and Giovanni Parmigiani. 2012. Estimation of sequencing error rates in short reads. BMC Bioinformatics 13:185.en_US
dc.identifier.issn1471-2105en_US
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:10588000
dc.description.abstractBackground: Short-read data from next-generation sequencing technologies are now being generated across a range of research projects. The fidelity of this data can be affected by several factors and it is important to have simple and reliable approaches for monitoring it at the level of individual experiments. Results: We developed a fast, scalable and accurate approach to estimating error rates in short reads, which has the added advantage of not requiring a reference genome. We build on the fundamental observation that there is a linear relationship between the copy number for a given read and the number of erroneous reads that differ from the read of interest by one or two bases. The slope of this relationship can be transformed to give an estimate of the error rate, both by read and by position. We present simulation studies as well as analyses of real data sets illustrating the precision and accuracy of this method, and we show that it is more accurate than alternatives that count the difference between the sample of interest and a reference genome. We show how this methodology led to the detection of mutations in the genome of the PhiX strain used for calibration of Illumina data. The proposed method is implemented in an R package, which can be downloaded from http://bcb.dfci.harvard.edu/∼vwang/shadowRegression.html. Conclusions: The proposed method can be used to monitor the quality of sequencing pipelines at the level of individual experiments without the use of reference genomes. Furthermore, having an estimate of the error rates gives one the opportunity to improve analyses and inferences in many applications of next-generation sequencing data.en_US
dc.language.isoen_USen_US
dc.publisherBioMed Centralen_US
dc.relation.isversionofdoi:10.1186/1471-2105-13-185en_US
dc.relation.hasversionhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC3495688/pdf/en_US
dash.licenseLAA
dc.titleEstimation of Sequencing Error Rates in Short Readsen_US
dc.typeJournal Articleen_US
dc.description.versionVersion of Recorden_US
dc.relation.journalBMC Bioinformaticsen_US
dash.depositing.authorParmigiani, Giovanni
dc.date.available2013-04-25T18:49:06Z
dc.identifier.doi10.1186/1471-2105-13-185*
dash.authorsorderedfalse
dash.contributor.affiliatedWang, Xin
dash.contributor.affiliatedParmigiani, Giovanni
dash.contributor.affiliatedDing, Jie


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record