Big Data in Biology and Medicine: Methodology and Computation
MetadataShow full item record
CitationYang, Shihao. 2019. Big Data in Biology and Medicine: Methodology and Computation. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
AbstractStatistics is entering into an exciting era. Huge volumes of electronic data are accumulated every day as the activities of millions of individuals are collected in nearly every aspect of life. With these big data also raises unique challenges. This thesis attempts to address the big data challenges in the context of real-world research projects, and to harness its power for solving real-life problems.
The biggest challenge is that the size of the dataset doesn’t guarantee the validity of the results; without rigorous methods, quick-and-dirty approaches typically give biased conclusions. This thesis thus attempts to develop novel and rigorous methodology for big-data analysis, focusing on two distinct big datasets. The first is to propose method that optimally extracts information from online search data such as Google for accurate infectious disease prediction, such as flu in United States or dengue fever in tropical countries. The second is to do causal inference on the electronic health data, studying the causal relationship between treatment and side-effect. In particular, I used a tailor-made matching method on a nation-wide electronic health data to study the causal relationship between cancer immunotherapy treatment and side-effects.
Another challenge in big-data study is that many traditional inference methods are not computationally feasible in the big data setting. Efficient computation and approximation tools must be developed.
In this thesis, I tackled the computation issue from two perspectives: a general computation tool and a problem-specific approximated inference. For general purpose computation, I developed a new parallelizable Markov chain Monte Carlo method for Bayesian posterior inference. For problem specific computation, I introduced a Gaussian process approximation method for inference in dynamic systems of ordinary differential equations.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:42029490
- FAS Theses and Dissertations