Show simple item record

dc.contributor.authorDeoras, Ameya N.
dc.contributor.authorRasmussen, Matthew D.
dc.contributor.authorGuigó, Roderic
dc.contributor.authorLin, Michael
dc.contributor.authorKellis, Manolis
dc.date.accessioned2010-11-30T21:49:32Z
dc.date.issued2008
dc.identifier.citationLin, Michael F., Ameya N. Deoras, Matthew D. Rasmussen, and Manolis Kellis. 2008. Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes. PLoS Computational Biology 4(4).en_US
dc.identifier.issn1553-734Xen_US
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:4595506
dc.description.abstractComparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (≤240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human.en_US
dc.language.isoen_USen_US
dc.publisherPublic Library of Scienceen_US
dc.relation.isversionofdoi:10.1371/journal.pcbi.1000067en_US
dc.relation.hasversionhttp://www.ncbi.nlm.nih.gov/pmc/articles/PMC2291194/pdf/en_US
dash.licenseLAA
dc.subjectcomputational biologyen_US
dc.subjectgenomicsen_US
dc.subjectevolutionary modelingen_US
dc.subjectcomparative sequence analysisen_US
dc.titlePerformance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomesen_US
dc.typeJournal Articleen_US
dc.description.versionVersion of Recorden_US
dc.relation.journalPLoS Computational Biologyen_US
dash.depositing.authorKellis, Manolis
dc.date.available2010-11-30T21:49:32Z
dash.affiliation.otherSPH^Immunology and Infectious Diseases TPHen_US
dc.identifier.doi10.1371/journal.pcbi.1000067*
dash.authorsorderedfalse
dash.contributor.affiliatedKellis, Manolis


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record