Machine Learning for Epidemiological Prediction
CitationAiken, Emily. 2019. Machine Learning for Epidemiological Prediction. Bachelor's thesis, Harvard College.
AbstractTracking the spread of disease in, or ahead of, real-time is essential for the allocation of treatment and prevention resources in healthcare systems, but traditional disease monitoring systems have significant inherent reporting delays due to lab test processing and data aggregation. Recent work has shown that machine learning methods leveraging a combination of traditionally collected epidemiological data and novel Internet-based data sources, such as disease-related Internet search activity, can produce timely and reliable disease activity estimates well ahead of traditional reports. This thesis addresses two gaps of knowledge in the existing literature on this new approach to data-driven disease monitoring and forecasting: 1) state-of-the-art predictive modeling approaches used in disease surveillance lag behind the state-of-the-art in machine learning; and 2) little effort has been put into modeling emerging disease outbreaks in data-poor and low-income regions. My first main result shows that, for regions and diseases where substantial epidemiological and Internet-based data are available, time-series deep learning models improve significantly upon the predictive performance of less sophisticated machine learning methods for a collection of tasks in disease forecasting, especially at long time horizons of prediction. I show in particular that a Gated Recurrent Unit (GRU) provides highly accurate forecasts for city- and state-level influenza activity in the United States up to eight weeks in advance. My second main result shows that simple machine learning methods incorporating digital trace data from Google query trends can provide rough but still useful estimates of incidence in emerging outbreaks weeks ahead of traditional reporting. This thesis provides a starting point for further development of model architectures for data-driven disease monitoring, and can serve as the basis for widening the epidemiological applications in which these models are employed.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364603
- FAS Theses and Dissertations