Publication: Examining viral pathogen evolution and spread through genomic data
No Thumbnail Available
Date
2023-01-19
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Tomkins-Tinch, Christopher H. 2022. Examining viral pathogen evolution and spread through genomic data. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
This work makes use of viral genomic data to examine the evolution and spread of the virus SARS-CoV-2 at multiple scales: from the virus present in individual patients over time to its introduction and transmission within a large university community.
Before reinfection with SARS-CoV-2 was a commonly known phenomenon, genomic data produced in the course of this work identified a case of reinfection in an immunosuppressed organ transplant recipient, distinguishing a second period of positivity and symptomatic COVID-19 as a new infection. The viral genomes from this later period were found to be derived phylogenetically from viral genomes sampled from other cases occurring in the surrounding region rather than from the genome of the initial period of positivity, indicating independent re-exposure in the community and subsequent reinfection. This was the first genomically-informed report of SARS-CoV-2 reinfection in a solid organ transplant recipient, the findings of which retain clinical relevance for the care of other immunocompromised individuals who may be vulnerable to reinfection with SARS-CoV-2, and highlight the importance of routine viral diagnostic testing of organ transplant recipients during periods of high pathogen transmission in the community.
To more broadly interrogate SARS-CoV-2 reinfection, the investigative approach employed in the study of the single case was applied systematically. The cohort considered included 1500 patients of Massachusetts General Hospital with medical records noting that each had a repeat positive test for SARS-CoV-2 more than 45 days after an initial positive. Viral genomic sequencing was attempted on available retained specimens, and patients were classified using genomic data and clinical and viral load-based assessments. Reinfection and persistent RNA detection were both identified, though conclusive classification of either was limited to a handful of cases due to the challenge of producing longitudinal sequence data for each patient. Comparing the clinical and genomic assessments, it was found that clinical assessment alone failed to identify approximately one third of reinfection cases. This finding has implications for which instances of clinical positivity are considered true active infections, and subsequently, the allocation of therapeutics and private hospital rooms.
An additional study examined transmission of SARS-CoV-2 at a large public university, using case metadata, environmental pathogen surveillance, viral genomic sequencing, and Wi-Fi-based proximity data to derive insights into the factors driving new infections. High-contact sports teams had higher incidence, as did teams with longer playing seasons. Incidence in residence halls was higher in buildings with greater occupancy. Most cases were contained within a few larger clusters, and a cryptic transmission chain during winter recess linked cases from one semester to others in the next. Viral concentration in wastewater was found to be a useful proxy for disease burden within buildings, and sequence data from wastewater detected viral variation not seen in clinical specimens from the university or the region. Members of phylogenetic clusters were found to aggregate in disjoint social sub-networks. Taken together, these observations were used to propose a framework for an integrated disease surveillance program, emphasizing the use of digital tools for gathering case data and contacts, with testing strategies informed by risk analysis, wastewater surveillance, and the transmission patterns observed from genomic data.
Work for each of the above sections relied on viral genomic data produced by and analyzed using the viral-ngs software described in the final chapter, which was written to enable reproducible analyses across a range of viral taxa, including the Ebola, Lassa, Zika, Mumps, and SARS-CoV-2 viruses. The software is open source and to date has been used to assemble over 160,000 openly published viral genomes.
Collectively, this work in the following chapters demonstrates the multifaceted role viral genomic data can have during a pandemic in clinical care, pathogen surveillance, and public health.
Description
Other Available Sources
Keywords
Genetics, Virology, Public health
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service