Modeling, Prediction, and Inference: Applications in Social and Infectious Disease Epidemiology
Kiang, Mathew V.
MetadataShow full item record
AbstractEpidemiology is at an exciting stage. Methods and techniques from other areas, such as data science, combined with advances in classic fields, such as Bayesian statistics, provide a new set of tools to explore epidemiological questions. In this dissertation, with collaboration from my committee members, I applied some of these new tools to three distinct epidemiological problems. My work contributes to the literature by showing how the synthesis and arbitrage of ideas from other fields can be adapted to diverse epidemiological settings.
First, I combined innovations in Bayesian statistics with advances in computation and statistical programming languages to jointly model racial/ethnic disparities in premature mortality at a scale not previously possible. Specifically, I used the shared component model to decompose premature mortality risk in non-Hispanic black and white Americans in the contiguous US into race-specific and shared components. I found that the majority of geographic variation in black-specific premature mortality risk was not shared with the white population, despite half of the geographic variation in white risk being shared with the black population.
Second, I estimated rates of missingness in a new method of spatiotemporally dense data collection called digital phenotyping. This type of data collection uses smartphones and does not require active participation by the user, making it a potentially useful data collection mechanism for epidemiologists interested in individual-level behavior. I found rates of missingness to be non-trivial (16-18%), increasing only slowly over time (0.5-1% per week), and largely uncorrelated with phone type or common demographic characteristics.
Third, I borrowed techniques from data science to systematically evaluate the performance of different classes and parameterizations of models in predicting dengue in Thailand at the province-level. Specifically, I compared an array of autoregressive models with regularized linear models. We found that model predictive performance varies greatly by both area and forecasting horizon with no single model or class of model performing best in every area or across all time horizons.
In summary, as data science and other fields become embedded in epidemiology, there is a large potential for the use of new tools to answer traditional and new public health questions.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:37925666