Enhancing Situational Awareness to Prevent Infectious Disease Outbreaks from Becoming Catastrophic

Catastrophic epidemics, if they occur, will very likely start from localized and far smaller (noncatastrophic) outbreaks that grow into much greater threats. One key bulwark against this outcome is the ability of governments and the health sector more generally to make informed decisions about control measures based on accurate understanding of the current and future extent of the outbreak. Situation reporting is the activity of periodically summarizing the state of the outbreak in a (usually) public way. We delineate key classes of decisions whose quality depends on high-quality situation reporting, key quantities for which estimates are needed to inform these decisions, and the traditional and novel sources of data that can aid in estimating these quantities. We emphasize the important role of situation reports as providing public, shared planning assumptions that allow decision makers to harmonize the response while making explicit the uncertainties that underlie the scenarios outlined for planning. In this era of multiple data sources and complex factors informing the interpretation of these data sources, we describe four principles for situation reporting:1. Situation reporting should be thematic, concentrating on essential areas of evidence needed for decisions. 2. Situation reports should adduce evidence from multiple sources to address each area of evidence, along with expert assessments of key parameters. 3. Situation reports should acknowledge uncertainty and attempt to estimate its magnitude for each assessment.4. Situation reports should contain carefully curated visualizations along with text and tables.


Introduction
Short of a massive, distributed bioterrorist attack, nearly any imaginable scenario for a globally catastrophic infectious disease outbreak would involve an initially small outbreak spreading from a few infections in a limited geographic area to infect many more people in many more places.It follows that efforts to prevent such catastrophic scenarios from materializing must include successful measures to stop or limit the spread of severe, but initially subcatastrophic events (Lipsitch 2017).Such measures are more likely to succeed if key decision makers --and those charged with implementing their decisions --have access to reliable, timely information on key parameters of an outbreak in progress.
It is a characteristic of infectious disease outbreaks that information available at the early stages is incomplete, uncertain, and often biased in the sense that observations are initially made on unrepresentative samples of the population (for example those reporting to hospitals) that are easily observed, and only later on more representative populations (Lipsitch et al. 2009b(Lipsitch et al. , 2011)).Knowledgeable public health professionals have a wealth of heuristics for filtering and integrating data to form early assessments of key quantities that are inputs to decisions -for example, current incidence and prevalence; forecasts of incidence and prevalence; geographic and demographic extent; and severity measures (Fig. 1, "Evidence").For example, experts in influenza epidemiology know that viral testing and influenza-like illnesses (ILI) are both incomplete measures of incidence with particular biases that vary over time.They have many, often unspoken filtering heuristics about how to infer "true" estimates of incidence (absolute or relative) from each type of system.Likewise, they have related sets of heuristics for integrating this information to account for biases of individual systems and to assess consistency of different indicators.
Figure 1: Key decisions on pandemic response and the evidence base on which they ideally rest; this evidence base is built up from surveillance inputs using interpretive tools such as transmission-dynamic models and "pyramid" severity models.Image adapted from Lipsitch et al. (2011) by Lucia Ricci.
In contrast to subject-matter experts, senior decision makers are typically generalists with less detailed knowledge of these aspects of any new disease.Their heuristics for interpreting raw data from surveillance systems, epidemiologic investigations, and novel data sources will be less nuanced, less informed by experience, and more variable from person to person.This may cause them to reach faulty conclusions about the magnitude of the threat, the options for and likely effects of potential responses, and the level of certainty surrounding each of these.We suggest in this chapter that information presented to decision makers, commonly known as Situation Reporting, should be tailored to give them not only raw data, but also synthetic expert judgment on the key characteristics of the outbreak and associated uncertainties.Such situation reporting would use a mixture of text and carefully chosen graphical presentations to convey expert estimates of key quantities and levels of certainty for each.In very data-poor settings (such as the early days of an epidemic of a novel disease) these syntheses would reflect expert judgment in interpreting the data that exist.For more familiar diseases, or as an epidemic of a new disease progresses, the presentation would include, in addition to expert interpretation, formal syntheses of emerging data using methods from statistics and machine learning.In all cases, modern situation reporting would incorporate not only traditional public health sources of data (mainly gathered from health systems), but also novel Internet-based data streams that can enhance the context, geographic extent, accuracy and timeliness of traditional sources.Carefully designed visualizations should be used to display spatial and temporal evolution of the current epidemic outbreak, geographic risk predictions, and other high-dimensional information.They may even include historical reconstructions of the spatio-temporal dynamics of previously observed outbreaks that may help contextualize the gravity of the ongoing public health threat.
In this chapter we begin by describing the decisions that rely on good situation reporting and the major topics on which assessments are crucial to good decisions in nearly all outbreaks.Next, we review traditional and novel Internet-based sources of data that can inform these assessments.Motivated by these uses and the available data, we then propose and discuss four criteria for high-quality situation reporting in outbreaks.

Decisions that rely on situation reporting
Among the many decisions facing policy makers throughout the course of an infectious disease outbreak (Lipsitch et al. 2011), arguably the most important fall into two broad categories: first, how big should the overall response be at each place and time, and second, how should the response be targeted to maximize effectiveness and limit costs?Specifically: • Overall scale of the response.How many personnel, supplies, and how much money should be allocated to the response, given the opportunity costs of reassigning these personnel from other health-enhancing activities within the health sector (e.g.routine vaccination) and, in the case of large outbreaks, possibly opportunity costs from outside the health sector (eg extra health spending occasioned by the outbreak).As with many of the allocation decisions below, this is a question that will be reassessed repeatedly throughout the course of the outbreak, up to the decision to terminate the outbreak response after the outbreak is over.• Targeting of countermeasures.If countermeasures to treat or prevent infection are available during an outbreak, they will likely be in short supply.Such countermeasures may include supportive or specific anti-infective medications, personal protective equipment, or vaccines.A key decision for public health officials is to make recommendations or policies for who should receive these treatments for their own direct protection, based on criteria of highest effectiveness, greatest need, greatest social value, or population preferences, among other possible criteria.For the subset of these countermeasures that can prevent transmission of the infection, such as vaccines, the timing and the choice of recipients for the supply of countermeasures might be chosen to optimally reduce the transmission rate of the infection.

Assessments that are crucial to making these decisions well
Specific inputs of evidence about the nature of the disease and the state of the outbreak should inform each of these classes of decisions.Crucial assessments, and sources of uncertainty for each, include: • Disease severity.Often measured as a case-fatality proportion or case-hospitalization proportion, severity of the novel infection informs the magnitude and immediacy of the response that should be undertaken, while relative severity measures in different groups inform the appropriate targeting of prevention and treatment interventions.Severity measures may change as the natural history of the disease becomes better understood, as in the case of Zika virus, for which the risk of congenital malformations in the offspring of infected pregnant women came to be appreciated as the primary severity measure.
Comparative severity measures in different demographic groups, defined by age or comorbidities for example, can enhance targeting of scarce countermeasures.Sources of uncertainty.Especially in the early phases of an outbreak, cases with a known outcome are likely unrepresentative of all cases, thus complicating the effort to estimate the typical severity of infection.On one hand, observed cases early on will typically be more severe than average, as severe cases are more likely to come to medical attention and be diagnosed.Typically at the start of an epidemic, it is unclear what fraction of cases are asymptomatic or subclinical, as these are rarely observed.This factor tends to cause severity to appear higher early on than it is (Lipsitch et al. 2015).On the other hand, in a growing epidemic, it is now recognized that severity may be underestimated when total reported severe outcomes (eg deaths) are divided by total reported cases, because cases may be reported before their outcome is known and reported, so the denominator will include many people who have not yet entered the numerator, but will enter it in the future (Garske et al. 2009).The unknown balance between these opposing biases creates uncertainty in severity estimation.When calculating subgroup-specific severity measures, possible variation between subgroups in the probability a case is detected or reported at varying severity levels (eg symptomatic, hospitalized, fatal) can produce uncertainty in severity comparisons between subgroups (Jain et al. 2009;Lipsitch et al. 2015;Rudolf et al. 2017;Wolkewitz and Schumacher 2017).
• Epidemic size and geographic extent.The total number of cases informs the number of persons affected, the number still at risk, and the resource requirements for treating patients and containing the outbreak.The trend in this total can be used to estimate the rate of spread and measures of contagiousness (such as the basic reproductive number R0), and to project resource needs into the future.The geographic extent of cases, and its trend, allow similar estimates on smaller spatial scales and may inform efforts to understand the routes of transmission.Sources of uncertainty: Not every disease case may be reported, due to limited capacity for surveillance.At the earliest phases of an epidemic, surveillance capacity may not be in place and may miss cases, while later on, there may be too many cases to count, and surveillance methods may need to be modified (Lipsitch et al. 2009a).These constraints may change over time, producing artifactual trends, and may vary from place to place, causing apparent differences between places that are due to surveillance capacity variation rather than only to variation in case numbers (White et al. 2009).In addition to all these factors, nearly all traditional surveillance systems have a delay between the occurrence of a case and its reporting, producing an artifactual decline in the epidemic curve as it approaches the present due to underreporting of recent cases."Nowcasting" algorithms, often involving nontraditional disease surveillance data sources, can be particularly helpful in addressing these limitations (Höhle and  • Transmissibility.Crucial for any effort to predict how an epidemic will spread are two numbers: how many secondary cases each infected person causes, and how long it takes them to do so.These are known technically as the reproductive number and the serial interval or generation interval.In reality, each of these varies from case to case, so they are more accurately described as distributions, each with its own mean and variation around the mean (Wallinga and Lipsitch 2007).Using these quantities and various types of mathematical models, spread of the infection can be projected over time, and the potential impact of seasonal variation in transmission, depletion of susceptibles by an immunizing infection, and various countermeasures (eg treatment or vaccination) can be estimated.Sources of uncertainty: At the very beginning of an outbreak, these quantities may be measured directly by contact tracing, so that secondary cases are traced back to primary cases, and generation intervals are estimated as the time between symptom onset in successive cases in a chain (an approximate measure of the serial interval).As the epidemic expands (and sometimes from the beginning) this is impractical given limited resources, and these quantities must be estimated from the daily number of new cases (epidemic curve) (Wallinga and Teunis 2004;White and Pagano 2008).As a consequence, all the sources of uncertainty in case counting noted above become sources of uncertainty in estimating transmissibility, though there are ways to address these (White et al. 2009), including the use of pathogen genome sequences to provide independent estimates of the dynamics of the outbreak (Fraser et al. 2009).Methodological errors can mask uncertainty in transmissibility estimates, but these are easily avoided (Magpantay and Rohani 2015).
• Countermeasure availability, status, and effectiveness.Central to response planning and implementation is an accurate inventory of what countermeasures are available, in what quantities and locations, and how effective they are projected to be.
Countermeasures include supplies to prevent transmission (vaccines, personal protective equipment, prophylactic antiinfective medications) and treat cases (medications for treatment, medical devices such as ventilators, health care disposables and supplies such as IV fluids (Voelker 2018)).The effectiveness of many of these countermeasures will be unknown at the start of an outbreak and may change over time (eg through the development of resistance by the pathogen causing the outbreak).
Countermeasures also include behavioral, social and economic interventions such as movement restrictions (Peak et al. 2018), closing of public gatherings and venues (Hatchett et al. 2007), and regionally-varying factors such as opening and closing of schools (Chao et al. 2010;Huang et al. 2014).

Sources of uncertainty:
For novel diseases, countermeasures will have uncertain effectiveness because they have not been tested and may be available, if at all, in short supply (Lipsitch and Eyal 2017).Timetables for producing such countermeasures (eg vaccines) depend on logistical factors that may be independent of, or even exacerbated by, the outbreak itself (Voelker 2018).The situation will change rapidly, as stockpiles are developed, depleted and replenished (Dimitrov et al. 2011).Even for known diseases, such as influenza, vaccine effectiveness varies from year to year (Osterholm et al. 2012).
A number of traditional and novel data sources can inform the real-time estimation of these quantities and the level of uncertainty of each estimate.We next review these data sources.

Data sources
To provide evidence in the four key areas noted above, a range of traditional and novel Internet-based data sources are available.We highlight some of the key ones in this section.

Traditional data sources
Early in an outbreak, the full data on the state of the outbreak may be contained in an epidemiologic line list, ideally containing demographic and geographic data on cases, clinical data on the diagnosis, course of their illness and treatment, as well as key dates such as the date on which they were infected (if known), became symptomatic, were reported to public health authorities, and, as applicable, were hospitalized, admitted to intensive care, recovered/discharged, or died.Many of these elements may be unavailable, at least temporarily, for some cases, so several efforts have been made to define minimal data sets needed for basic analyses early in outbreaks (Van Kerkhove et al. 2010;Cori et al. 2017).On the other hand, tools have been developed recently to implement more complex data structures that may include different elements for different cases and can incorporate novel types of data, including pathogen sequences when available (Grad and Lipsitch 2014;Jombart et al. 2014;Finnie et al. 2016).
As an outbreak grows, it will likely become impossible for some jurisdictions to continue testing all suspected cases and/or reporting detailed data on suspected or confirmed cases of disease.Alternative approaches, such as reporting clinical events (emergency department or primary care visits meeting syndromic criteria, for example), combined with diagnostic test results on a fraction of these clinical cases, to maintain a quantitative picture of the progress of the outbreak while using fewer resources (Lipsitch et al. 2009a).In locations with limited resources, this strategy may be employed from the start.
These epidemiologic data will be central to the first three evidence needs outlined above.The fourth need, to estimate countermeasure availability and effectiveness, will primarily require logistical and supply chain information about the production and distribution of vaccines, pharmaceutical treatments, and personal protective equipment.For antiinfective treatments, real-time data on the susceptibility of cases will be required to assess the likely impact of these treatments, estimate trends in resistance, and inform the optimal use of these and other countermeasures (Leung et al. 2017).To improve estimates of their past and potential effectiveness, data on the timing and geographic scope of nonpharmaceutical interventions, such as movement restrictions, safe burial practices (Tiffany et al. 2017) or school closings and openings (Chao et al. 2010;Huang et al. 2014) may be gathered by traditional means (surveys or administrative data) or by some of the novel means described below (Peak et al. 2018).

Novel data sources
The availability of big data sets, generated and recorded constantly due to the activities of millions of Internet and mobile phone users, has increased significantly and has opened up new ways to understand changes in human behavior.Of particular interest is the availability of Internet-based data that may help us detect changes in human behavior that may signal the emergence of a public health threat in real-time.These data may include unusual surges of symptom-related search activity on Internet search engines, an increase of symptomrelated posts on social media, increased sales in over the counter medications to combat fever or other symptoms.
In fact, in the past decade, many research teams have been able to identify historical relationships between information contained in healthcare-based disease surveillance systems--such as the number of hospitalizations and/or patients seeking medical attention with an array of symptoms--and symptom-related Internet search behavior (Yang et al. 2015), Wikipedia article views (Generous et al. 2014;McIver and Brownstein 2014), clinicians' search behavior (Santillana et al. 2014), crowd-sourced symptom self-reporting apps (Smolinski et al. 2015;Koppeschaar et al. 2017), symptom-related Twitter posts (Signorini et al. 2011;Paul et al. 2014), prescription changes contained in cloud-based electronic health records (Santillana et al. 2016;Yang et al. 2017;Lu et al. 2019) and historical synchronicities in disease activity in neighboring regions (Lu et al. 2019) , weather patterns, etc.These studies have shown that behavior changes in human populations, often a consequence of (or correlated with) increased disease activity, have detectable signatures in systems that were not originally designed as public health surveillance systems.These findings suggest that monitoring Internet search and/or social media activity related to symptoms or specific diseases, may help confirm the presence of public health threats.
Once a local disease outbreak has been identified, current and future weather patterns that may be conducive for such outbreak to further disseminate may be identified and may allow the creation of risk maps in real-time.For example, it is now well known that changes in ambient air moisture (relative humidity) influence the mechanistic human-to-human transmission of respiratory diseases such as influenza (Lowen et al. 2007;Shaman and Kohn 2009;Shaman et al. 2011).Drier months, such as those that happen during the colder seasons --in mid-latitudes--enhance disease-transmission. Vector-borne diseases such as Dengue, Malaria, and Yellow Fever, can only be spread if local conditions are suitable for mosquitos to exist and reproduce (Kyle and Harris 2008).Thus, maps of the presence of vectors could be used to product risk maps in real-time (Messina et al. 2015).Mobile phone information can be used at the local level to map human mobility, whereas bus, train, or airline logs can be used to assess the likelihood of a given disease to be transmitted from point A to point B. Models incorporating these data have demonstrated the potential to predict outbreaks in new geographic locations, for example with dengue in Pakistan (Wesolowski et al. 2015).
While many of these data sources may be helpful for disease surveillance they have clear limitations.For example, people with mobile phones and/or Internet access do not necessarily reflect the underlying demographics of the locations where they live.This fact introduces biases that need to be considered when using in these data sources as indicators of the presence of a disease.Another limitation stems from the fact that people are susceptible to "panic searching" when news outlets alert them of unusual flu, or dengue, or Ebola disease outbreaks.As a consequence, peaks of search activity and increased social media microblogs discussing symptoms or diseases may only signal a population's surge of interest in a disease-related topic but may not reflect actual infections.One of us (MS) is actively developing approaches to address these limitations (Santillana et al. 2015).
Finally, it has been shown that some of the uncertainties and limitations inherent to each individual data source may be mitigated by combining multiple data sources in order to assess the gravity of a disease outbreak (Santillana et al. 2015;McGough et al. 2017;Lu et al. 2019).

Situation Reporting as a Source of Common Planning Assumptions
A key aim of situation reporting, sometimes underappreciated, is to provide analysts and decision makers with a common set of facts (even if these are uncertain) so that decisions can be made using shared assumptions rather than unstated ones which may vary from person to person and thus cause confusion or error.Publicly stating working interpretations of existing data in a situation report is not intended to suppress disagreements in interpretation but rather to make these explicit, and to note which facts can be known with confidence and what are the key sources of uncertainty.Two examples from the 2009 influenza pandemic may help to illustrate the potential ofsituation reports centered on the four areas of critical evidence stated above to alleviate confusion and improve decisions.
• In the 2009 influenza pandemic, perhaps the most important quantity on which evidence was needed for decision making was the severity -as measured by case-fatality and case-hospitalization rates.Early estimates varied by a factor of 10,000, from a raw estimate in Mexico of 4% based on case and death numbers on May 4, to an adjusted estimate of 0.0004% published in July (Wilson and Baker 2009), spanning the full range of the severity scale established by the US Government for pandemic planning (of Health et al. 2007).The first official US government publication (to our knowledge) that contained a specific scenario at a particular severity level was the August 2009 PCAST report (Executive Office of the President and 's Council of Advisors on Science and Technology 2009), despite the fact that CDC investigations and surveillance had been producing relevant data in the US as early as April-May (Iuliano et al. 2009;Reed et al. 2009).The act of assembling a thematic situation report that brought together diverse sources of evidence on severity could have helped to narrow this range of uncertainty by bringing together data that had been siloed in individual investigations.
• Vaccine planning in the United States proceeded in the 2009 pandemic on the assumption of a mid-to late-winter peak of influenza incidence, allowing time for the production of enough doses (160 million) [6] to cover "initial target groups" in a timely fashion.This view was not supported by historical evidence from pandemics [7] cited by NIH authors.As they predicted based on historical experience, the major wave came in the autumn and was largely complete in most places in the US by the time many doses were available.Making an explicit projection about the likely timing of the peak of cases and its uncertainty --and specifically, the incorporation of historical data to provide context --could have improved the quality of assumptions used to plan vaccine rollout and targeting.

Projecting the future
Situation reporting is intimately connected with making projections about how an epidemic may unfold in the future.Indeed, some readers of a situation report may be primarily concerned not with how big or widespread the epidemic is now, but with how big or widespread it could become.Early situation reports will typically contain little in the way of projections, but as an epidemic develops, it may become appropriate to begin including some projections of its likely trajectory under various scenarios.Indeed, to achieve the goal of creating common planning assumptions described in the previous section, some such scenarios must be developed and include a forward-looking component.Planning scenarios may be developed even without accurate forecasts, but they will be more useful if they are based on the best possible forecasts that can be achieved at a particular stage in the epidemic.Empirically, it should be noted that even when a planning scenario is explicitly and repeatedly annotated as being purely that, and not a prediction, it may be reported in the lay press as if it were a forecast.The 2009 PCAST working group report on the US Government's pandemic response repeatedly characterized its planning scenarios as "not a prediction" in three separate places [ref]  The technical aspects of how to project disease incidence could fill an entire book, but for the purposes of situation reporting some crucial information should accompany any such projections and should be demanded by decision makers if not explicit in the situation report.The key question for any projection is what assumptions underlie it.In particular, many projections of disease cases indicate that if current trends continue, there may be x cases by a certain time.For infectious diseases, current trends cannot continue indefinitely.
The simplest models may assume that the epidemic continues growing exponentially at the same rate as in its earliest phases.For any growing epidemic, such models can project arbitrarily high numbers of cases because exponential growth never ends (Meltzer et al. 2001) --the only question is how long the epidemic will take to reach a given number of cases (Meltzer et al. 2014).Such projections usually provide a near-worst case scenario, because typically the factors that change during an epidemic tend to moderate transmission rather than increase it.That said, there are important exceptions such as changing weather or vector density for arthropod-borne infections, which can move cyclically with the seasons.
More refined projections --not assuming "current trends continue" will incorporate factors that modify transmission, including behavior change induced by a desire to control the infection, behavior change for unrelated reasons (eg the beginning and end of school terms that affect directly transmitted diseases), seasonal changes that affect the suitability of transmission through the biology of the infectious agent or its vectors, and depletion of susceptible hosts as individuals previously infected become immune and thus reduce the opportunities for transmission.A projection should clearly state which of these factors it takes account of, what it assumes, and to what evidence exists (or is needed) to support these assumptions.Finally, efforts should be made to include uncertainty estimates (eg confidence intervals) around scenario-based projections that may be displayed on visualizations as uncertainty cones, similar to those used to monitor the likely trajectory of a hurricane in weather prediction systems.

Principles for high-quality situation reporting.
The goals of providing evidence to decision makers on key quantities relevant to responding to outbreaks, providing common scenarios for the purposes of planning, and highlighting areas of uncertainty, suggest four principles to enhance the quality of situation reporting in outbreaks.
• Situation reporting should be thematic, concentrating on essential areas of evidence needed for decisions.Situation reports should be designed for clarity and value to top-level decision makers, as well as for technical scrutiny by subject-matter experts.Decision makers may lack the time or skills or specialized knowledge to interpret raw data such as case counts, Google search trends, or the like.They may not immediately see the relevance of each data source to the key quantities about which they need information.Thus, maximal value to these consumers of the report will be achieved by organizing data outputs by the quantity of interest they inform, rather than in a simple list.This leads to the second principle: • Situation reports should adduce evidence from multiple sources to address each area of evidence, along with expert assessments of key parameters.Text describing the expert judgment about severity, numbers and geographic extent, and other assessments should be combined with data in the forms of tables and graphs.
Notwithstanding the wealth of potential data sources for tracking an outbreak and the response to it, data alone are not sufficient to support evidence-based decisions reflecting a clear picture of the four areas noted above.Key data may be unavailable, especially in the areas hardest hit by an outbreak, and even when available they may be limited, confusing or even misleading.Subject-matter experts --epidemiologists, clinicians, data managers, and those involved in delivering the public health response -will typically have knowledge that is vital to sound interpretation of the data.A crucial feature of situation reporting is to make the data, as well as this expert knowledge, widely available to enhance the quality of evidence for decisions, and also to allow scrutiny and critique of the interpretation.Crucial to this presentation is the next principle: • Situation reports should acknowledge uncertainty and attempt to estimate its magnitude for each assessment.This prevents provisional assessments from becoming accepted as unchangeable facts, while acknowledging the possibility that estimates may change as data improve.Finally: • Situation reports should contain carefully curated visualizations along with text and tables.These visualizations should clearly demarcate existing data from projections, visually represent uncertainty bounds, and be presented in intuitive ways that have been tested for clarity with an audience of decision makers before a crisis hits.

Conclusion
Accurate, informative, and clear situation reporting is essential for evidence-based decision making and planning in the midst of an outbreak that may be chaotic and full of confusing and contradictory information.In this chapter we have advocated for augmenting raw data with expert interpretation and planning scenarios to aid the decision makers by providing open discussion of what is and is not known and a set of shared assumptions for planning purposes.New data sources provide unprecedented opportunities to improve our understanding of epidemic dynamics as a new outbreak unfolds, and these must be integrated with more traditional data sources to aid decision makers in understanding the big picture, not only the raw data.Achieving these goals is a crucial part of minimizing the probability that an initially small and local outbreak grows to regionally or globally catastrophic proportions.