Statistical Methods for Analysis of Genetic and Genomic Data in Population Science
Barfield, Richard Thomas
MetadataShow full item record
CitationBarfield, Richard Thomas. 2017. Statistical Methods for Analysis of Genetic and Genomic Data in Population Science. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
AbstractIn chapter 1, we develop a missing mediator analysis using the EM algorithm for studies where the mediator is a genomic marker. Typically measures such as DNA methylation or gene expression are collected on a subset of participants from a larger study. Under standard assumptions for mediation analysis and an additional assumption that the missing data mechanism is ignorable, we can estimate the causal direct and indirect effects using all individuals with exposure and outcome. We applied our method to Project Viva to assess whether cord blood DNA methylation mediates the effect of maternal pre-pregnancy BMI on childhood BMI.
In chapter 2, we develop a statistical method to estimate cell specific associations in whole blood DNA methylation data which is a mixture of several cell types using observed cell composition when cell-specific methylations are not observed. We use Generalized Estimating Equations to estimate cell specific exposure effects using observed whole blood methylation and cell type count data. We evaluated the performance of the proposed methods through simulation studies and analyzing data from the Normative Aging Study to assess for cell specific smoking associations on 49 probes established to be associated with smoking on the aggregate csale.
In chapter 3, we introduce a novel approach to help differentiate when multiple eQTL genes co-localize at disease loci (due to linkage disequilibrium, LD), to help in identifying the true susceptible gene. We developed LD aware MR-Egger regression, an extension of MR-Egger regression to when multiple SNPs in LD are associated with gene expression. This approach requires only summary GWAS and eQTL effects, along with LD from reference panels. Through simulations we show that when SNPs have direct (pleiotropic) effects, our approach provides adequate control of type I error, high power, and less bias than previously proposed methods under certain conditions. We analyzed summary data from a GWAS on the risk of Breast Cancer with eQTL data from breast tissue from GTEx to demonstrate the usefulness of this method.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:41142029
- FAS Theses and Dissertations