Statistical Analysis and Methods for Human -Omics Data
CitationFeng, Yen-Chen. 2017. Statistical Analysis and Methods for Human -Omics Data. Doctoral dissertation, Harvard T.H. Chan School of Public Health.
AbstractFast advancement in high-throughput technology has allowed screening of millions of molecular markers at multiple levels of the biological system in large samples to study the genetic basis and biological variation underlying complex traits and diseases. Such -omics data covers variation in the genome, epigenome, transcriptome, proteome, as well as metabolome. My dissertation projects take advantage of these rich sources of human multi-omics data, focusing on developing and applying statistical methods to answer questions that often arise in large-scale “-omic” epidemiology studies.
Single nucleotide polymorphisms (SNPs) are inherited genetic variations that may confer genetic predisposition towards complex diseases. Genome-wide association studies (GWAS) have been particularly successful in identifying numerous SNPs associated with non-Mendelian traits. GWAS of different traits also open up new opportunities to study the shared genetics across a range of phenotypes. In Chapter 1, I will describe how we examined such relationship between Alzheimer’s disease (AD) and cancer using GWAS summary statistics and identified significant, positive genetic correlations of AD with specific cancer types.
Epigenetic modifications, including DNA methylation, are another crucial layer that regulates gene expression in a tissue-specific manner without changing the genetic code. DNA methylation is involved in determining cell differentiation and is a marker of inhibited transcription. Studying cell-type specificity of DNA methylation in relation to diseases helps to identify the key cell type(s) for mechanistic follow-up. In Chapter 2, I will describe a statistical method we developed to estimate cell-type-specific phenotype-methylation association when direct measurement of cell-specific methylation is not available, and the simulations and real data analysis we conducted to evaluate its performance.
Metabolome is a key endpoint linking genotype to phenotype that reflects perturbations from all levels of biological processes. Metabolomics data measured by the LC-MS experiments provides a powerful framework for studying disease mechanism and drug discovery, yet it often suffers substantial batch effect that makes cross-study comparison difficult. In Chapter 3, I will illustrate an approach to normalizing metabolomics data across studies using the information from overlapping samples. We compared different normalization methods and identified quantile normalization as a preferred method to calibrate the cross-study deviation in metabolite distributions.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:42066839