Publication:
Machine Learning for Epidemiological Prediction

No Thumbnail Available

Date

2019-08-23

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Aiken, Emily. 2019. Machine Learning for Epidemiological Prediction. Bachelor's thesis, Harvard College.

Research Data

Abstract

Tracking the spread of disease in, or ahead of, real-time is essential for the allocation of treatment and prevention resources in healthcare systems, but traditional disease monitoring systems have significant inherent reporting delays due to lab test processing and data aggregation. Recent work has shown that machine learning methods leveraging a combination of traditionally collected epidemiological data and novel Internet-based data sources, such as disease-related Internet search activity, can produce timely and reliable disease activity estimates well ahead of traditional reports. This thesis addresses two gaps of knowledge in the existing literature on this new approach to data-driven disease monitoring and forecasting: 1) state-of-the-art predictive modeling approaches used in disease surveillance lag behind the state-of-the-art in machine learning; and 2) little effort has been put into modeling emerging disease outbreaks in data-poor and low-income regions. My first main result shows that, for regions and diseases where substantial epidemiological and Internet-based data are available, time-series deep learning models improve significantly upon the predictive performance of less sophisticated machine learning methods for a collection of tasks in disease forecasting, especially at long time horizons of prediction. I show in particular that a Gated Recurrent Unit (GRU) provides highly accurate forecasts for city- and state-level influenza activity in the United States up to eight weeks in advance. My second main result shows that simple machine learning methods incorporating digital trace data from Google query trends can provide rough but still useful estimates of incidence in emerging outbreaks weeks ahead of traditional reporting. This thesis provides a starting point for further development of model architectures for data-driven disease monitoring, and can serve as the basis for widening the epidemiological applications in which these models are employed.

Description

Other Available Sources

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories