Publication: Modeling inter-individual variation in single-cell datasets to detect cell state abundance associations to clinical features and genetic variants
No Thumbnail Available
Open/View Files
Date
2024-05-07
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Rumker, Laurie. 2024. Modeling inter-individual variation in single-cell datasets to detect cell state abundance associations to clinical features and genetic variants. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
In order to understand disease development, create effective medical treatments, and predict clinical outcomes, researchers study body tissue sampled from a wide variety of human donors. Researchers seek to detect tissue characteristics that associate with donor attributes like disease risk or treatment response. The advent of single-cell genomic technologies has enabled unbiased acquisition of diverse measurements, such as gene expression or chromatin accessibility, for each cell in a tissue sample. Single-cell datasets reveal the complexity of tissue composition in the human body at unprecedented resolution and provide new opportunities to detect tissue associations to donor attributes. In particular, single-cell datasets offer new opportunities to reveal what kinds of cells, among many possible “cell states," associate in abundance with donor attributes of interest. However, existing association-testing approaches are anchored in researcher-driven choices about which cell states are most relevant and these choices can limit the scope of associations detected. This thesis presents a novel approach that offers more flexible and data-driven identification of cell states associated in abundance with donor attributes. Our approach enables researchers to take better advantage of the rich information available in single-cell datasets and has already offered new insight into diseases including tuberculosis and systemic lupus erythematosus.
The first portion of this thesis introduces our novel framework for cell state abundance association testing. This framework leverages both the granularity of cell states and the variation across donors that are revealed in tissues by single-cell datasets. In this framework, we quantify cell abundance per donor across many granular cell states termed “neighborhoods” and uncover patterns of neighborhood abundance variation that are shared across donors. We illustrate how this framework produces a set of derived tissue features per donor that can be used to improve statistical power and accuracy in the detection of cell state abundance associations relative to the existing paradigm. We apply this framework to single-cell datasets of blood tissue to characterize immune dysfunction in autoimmunity and infection.
Modeling cell state abundance associations at fine-grained resolution offers important advantages, but also necessitates new considerations for potential sources of confounding. The second portion of this thesis characterizes a source of confounding in cell state abundance association testing to which neighborhood-resolution models of single-cell data are particularly vulnerable. We also introduce a strategy to address this confounding that offers benefits across multiple neighborhood-based association-testing tools.
Cell states that are differentially abundant between donors with and without a disease may result from disease development processes or disease sequelae. However, genetic variants that confer an elevated risk of disease may also associate with the abundance of cell states and more specifically illuminate causal processes of disease development. The final portion of this thesis introduces a tool that adapts our framework to flexibly detect cell state abundance associations to genetic variants at genome-wide scale. In a dataset of blood tissue, we reveal novel genotype-phenotype associations that offer clues about genetic mechanisms of immune-mediated disease risk.
This work demonstrates the importance of modeling cell states in single-cell datasets at fine-grained resolution and the value in examining shared patterns of abundance across individuals. It is our hope that the methods produced by this work will empower researchers to unlock new insights from single-cell datasets, expanding our understanding of disease and ultimately improving human health and health care.
Description
Other Available Sources
Keywords
Bioinformatics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service