Big Data in Biology and Medicine: Methodology and Computation
Author
Yang, Shihao
Metadata
Show full item recordCitation
Yang, Shihao. 2019. Big Data in Biology and Medicine: Methodology and Computation. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.Abstract
Statistics is entering into an exciting era. Huge volumes of electronic data are accumulated every day as the activities of millions of individuals are collected in nearly every aspect of life. With these big data also raises unique challenges. This thesis attempts to address the big data challenges in the context of real-world research projects, and to harness its power for solving real-life problems.The biggest challenge is that the size of the dataset doesn’t guarantee the validity of the results; without rigorous methods, quick-and-dirty approaches typically give biased conclusions. This thesis thus attempts to develop novel and rigorous methodology for big-data analysis, focusing on two distinct big datasets. The first is to propose method that optimally extracts information from online search data such as Google for accurate infectious disease prediction, such as flu in United States or dengue fever in tropical countries. The second is to do causal inference on the electronic health data, studying the causal relationship between treatment and side-effect. In particular, I used a tailor-made matching method on a nation-wide electronic health data to study the causal relationship between cancer immunotherapy treatment and side-effects.
Another challenge in big-data study is that many traditional inference methods are not computationally feasible in the big data setting. Efficient computation and approximation tools must be developed.
In this thesis, I tackled the computation issue from two perspectives: a general computation tool and a problem-specific approximated inference. For general purpose computation, I developed a new parallelizable Markov chain Monte Carlo method for Bayesian posterior inference. For problem specific computation, I introduced a Gaussian process approximation method for inference in dynamic systems of ordinary differential equations.
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
http://nrs.harvard.edu/urn-3:HUL.InstRepos:42029490
Collections
- FAS Theses and Dissertations [5858]
Contact administrator regarding this item (to report mistakes or request changes)