Computational Methods to Advance From Genetic Association to Biological Insight
Citation
Fine, Rebecca S. 2020. Computational Methods to Advance From Genetic Association to Biological Insight. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.Abstract
The increasing availability of large-scale genetic data has enabled massive efforts to understand the genetic basis of human traits and diseases. However, roughly 90% of genome-wide association study (GWAS) signals can be attributed to noncoding variation. Such variants are challenging to interpret and connect to biological insight because (1) there are often multiple possible causal genes at each locus and (2) linking noncoding variants to the genes they regulate is difficult, which obscures the relevant biological processes. One useful approach in addressing this problem is the application of algorithms that search for commonalities across loci, which can implicate biological processes through gene set enrichment analysis and prioritize likely causal genes. To facilitate progress from genetic association to biological hypotheses, I have built upon this strategy to develop two different methods.First, I developed a gene set enrichment analysis method for the ExomeChip, a genotyping array which focuses on rare and low-frequency coding variation. To do this, I adapted DEPICT, a gene set enrichment analysis method developed by our lab for GWAS data. DEPICT is particularly powerful as a tool for biological interpretation due to its use of gene sets that have been extended via coexpression data to make predictions about the function of uncharacterized genes. I applied this method to many different types of traits, including anthropometric (e.g. height), hormonal (e.g. adiponectin), and glycemic (e.g. fasting glucose).
Second, many different gene prioritization algorithms for GWAS have been developed using a commonality-based strategy, but it is difficult to determine which are the most accurate. Most new methods benchmark using “gold standards” (genes already known to play a role in the trait of interest). However, such gold standards are biased toward well-studied genes and pathways. Therefore, I developed Benchmarker, a leave-one-chromosome-out method for benchmarking prioritization methods that relies only on the original GWAS data and evaluates method performance using stratified LD score regression.
The methods described in this dissertation contribute to an important and growing body of work on more effective and rigorous analysis of GWAS data to obtain biological insight.
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37365548
Collections
- FAS Theses and Dissertations [6847]
Contact administrator regarding this item (to report mistakes or request changes)