Show simple item record

dc.contributor.advisorLin, Xihong
dc.contributor.advisorKraft, Peter
dc.contributor.authorBarfield, Richard Thomas
dc.date.accessioned2019-08-09T09:14:00Z
dash.embargo.terms2019-05-01
dc.date.created2017-05
dc.date.issued2017-05-02
dc.date.submitted2017
dc.identifier.citationBarfield, Richard Thomas. 2017. Statistical Methods for Analysis of Genetic and Genomic Data in Population Science. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
dc.identifier.urihttp://nrs.harvard.edu/urn-3:HUL.InstRepos:41142029*
dc.description.abstractIn chapter 1, we develop a missing mediator analysis using the EM algorithm for studies where the mediator is a genomic marker. Typically measures such as DNA methylation or gene expression are collected on a subset of participants from a larger study. Under standard assumptions for mediation analysis and an additional assumption that the missing data mechanism is ignorable, we can estimate the causal direct and indirect effects using all individuals with exposure and outcome. We applied our method to Project Viva to assess whether cord blood DNA methylation mediates the effect of maternal pre-pregnancy BMI on childhood BMI. In chapter 2, we develop a statistical method to estimate cell specific associations in whole blood DNA methylation data which is a mixture of several cell types using observed cell composition when cell-specific methylations are not observed. We use Generalized Estimating Equations to estimate cell specific exposure effects using observed whole blood methylation and cell type count data. We evaluated the performance of the proposed methods through simulation studies and analyzing data from the Normative Aging Study to assess for cell specific smoking associations on 49 probes established to be associated with smoking on the aggregate csale. In chapter 3, we introduce a novel approach to help differentiate when multiple eQTL genes co-localize at disease loci (due to linkage disequilibrium, LD), to help in identifying the true susceptible gene. We developed LD aware MR-Egger regression, an extension of MR-Egger regression to when multiple SNPs in LD are associated with gene expression. This approach requires only summary GWAS and eQTL effects, along with LD from reference panels. Through simulations we show that when SNPs have direct (pleiotropic) effects, our approach provides adequate control of type I error, high power, and less bias than previously proposed methods under certain conditions. We analyzed summary data from a GWAS on the risk of Breast Cancer with eQTL data from breast tissue from GTEx to demonstrate the usefulness of this method.
dc.description.sponsorshipBiostatistics
dc.format.mimetypeapplication/pdf
dc.language.isoen
dash.licenseLAA
dc.subjectBiostatistics
dc.subjectStatistical Genetics
dc.titleStatistical Methods for Analysis of Genetic and Genomic Data in Population Science
dc.typeThesis or Dissertation
dash.depositing.authorBarfield, Richard Thomas
dash.embargo.until2019-05-01
dc.date.available2019-08-09T09:14:00Z
thesis.degree.date2017
thesis.degree.grantorGraduate School of Arts & Sciences
thesis.degree.grantorGraduate School of Arts & Sciences
thesis.degree.levelDoctoral
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
thesis.degree.nameDoctor of Philosophy
dc.contributor.committeeMemberVanderWeele, Tyler
dc.type.materialtext
thesis.degree.departmentBiostatistics
thesis.degree.departmentBiostatistics
dash.identifier.vireo
dash.author.emailbarfieldrichard8@gmail.com


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record