Integrating Functional Data to Prioritize Causal Variants in Statistical Fine-Mapping Studies

Kichaev, Gleb; Yang, Wen-Yun; Lindstrom, Sara; Hormozdiari, Farhad; Eskin, Eleazar; Price, Alkes L.; Kraft, Peter; Pasaniuc, Bogdan

View/Open

4214605.pdf (544.8Kb)

Author

Kichaev, Gleb

Yang, Wen-Yun

Lindstrom, Sara

Hormozdiari, Farhad

Eskin, Eleazar

Price, Alkes L. HARVARD

Kraft, Peter HARVARD

Pasaniuc, Bogdan

Published Version

https://doi.org/10.1371/journal.pgen.1004722

Metadata

Show full item record

Citation

Kichaev, Gleb, Wen-Yun Yang, Sara Lindstrom, Farhad Hormozdiari, Eleazar Eskin, Alkes L. Price, Peter Kraft, and Bogdan Pasaniuc. 2014. “Integrating Functional Data to Prioritize Causal Variants in Statistical Fine-Mapping Studies.” PLoS Genetics 10 (10): e1004722. doi:10.1371/journal.pgen.1004722. http://dx.doi.org/10.1371/journal.pgen.1004722.

Abstract

Standard statistical approaches for prioritization of variants for functional testing in fine-mapping studies either use marginal association statistics or estimate posterior probabilities for variants to be causal under simplifying assumptions. Here, we present a probabilistic framework that integrates association strength with functional genomic annotation data to improve accuracy in selecting plausible causal variants for functional validation. A key feature of our approach is that it empirically estimates the contribution of each functional annotation to the trait of interest directly from summary association statistics while allowing for multiple causal variants at any risk locus. We devise efficient algorithms that estimate the parameters of our model across all risk loci to further increase performance. Using simulations starting from the 1000 Genomes data, we find that our framework consistently outperforms the current state-of-the-art fine-mapping methods, reducing the number of variants that need to be selected to capture 90% of the causal variants from an average of 13.3 to 10.4 SNPs per locus (as compared to the next-best performing strategy). Furthermore, we introduce a cost-to-benefit optimization framework for determining the number of variants to be followed up in functional assays and assess its performance using real and simulation data. We validate our findings using a large scale meta-analysis of four blood lipids traits and find that the relative probability for causality is increased for variants in exons and transcription start sites and decreased in repressed genomic regions at the risk loci of these traits. Using these highly predictive, trait-specific functional annotations, we estimate causality probabilities across all traits and variants, reducing the size of the 90% confidence set from an average of 17.5 to 13.5 variants per locus in this data.

Other Sources

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4214605/pdf/

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA

Citable link to this page

http://nrs.harvard.edu/urn-3:HUL.InstRepos:13454683

Collections

SPH Scholarly Articles [6362]

Contact administrator regarding this item (to report mistakes or request changes)