New Statistical Methods for Integrative Data Analysis and Applications in Biology, Epidemiology and Finance
AbstractThis dissertation consists of four self-contained chapters where statistical methods for integrative data analysis with applications in biology, epidemiology and finance are introduced and discussed. Chapters 1 and 2 focus on aggregating multi-source, high-throughput biological data for enhancement in functionality annotation and prediction: in chapter 1, we introduce an integrative algorithm based on Bayesian hidden Markov tree models to incorporate genes' phylogenetic profile and their inferred evolutionary histories for gene clustering and functional prediction; in chapter 2, we work on aggregating the genetic and pharmacological profiling data in Cancer Cell Line Encyclopedia to provide predictions on the mechanisms and targets of cancer-treating drugs. In chapter 3, we move to integrating large-scale digital data and spatio-temporal epidemics data, and show how to improve robustness and accuracy in localized influenza tracking by effectively combining Internet search data and traditional disease surveillance data. In chapter 4, we take a more general view as to link multi-dimension data with a non-parametric Bayesian copula model and predict the irregular covariance structure between stock price and index data during the financial crisis.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:39947181
- FAS Theses and Dissertations