Publication: Methods for the design and analysis of disease-oriented multi-sample single-cell studies
No Thumbnail Available
Open/View Files
Date
2024-05-07
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Millard, Nghia. 2024. Methods for the design and analysis of disease-oriented multi-sample single-cell studies. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
Recent advances in single-cell technologies have enabled the characterization of
heterogeneous cell types in human diseases by measuring various features of individual cells,
such as their transcriptomic, proteomic, and epigenomic profiles in the context of their spatial
location in tissue. Due to the expensive cost and the high-dimensionality, sparsity, and noisiness
of single-cell data investigators who wish to use single-cell technologies face key challenges in
designing single-cell studies, performing integrative analysis of cells from multiple samples, and
gleaning biological understanding from these data. In this dissertation, I present the development
and application of novel computational methods and analysis frameworks that help address these
challenges.
First, I introduce scPOST, an algorithm for simulating large-scale, multi-sample singlecell
RNA-sequencing datasets. scPOST enables investigators to simulate their future single-cell
studies with different parameters, such as the number of cells, number of cells per sample, and
number of batches. This allows investigators to determine the optimal design parameters for their
study.
Next, I introduce the development and application of two algorithms, Harmony and
Crescendo, which are batch correction algorithms designed to help remove the batch effects that
are prominent in single-cell data. I show that these algorithms feature superior performance in removing batch effects and are fast and scalable to large single-cell datasets that contain
hundreds of thousands or even millions of cells.
Finally, I showcase the application of these methods to analyzing a large 82-sample
cohort of rheumatoid arthritis (RA) patients containing 314,000 cells. After performing batch
correction with Harmony and a prospective power analysis with scPOST, I introduce a novel
framework called cell-type abundance phenotypes (CTAPs) for classifying samples based on the
abundance of cell types present in the sample. I then discuss how we used the CTAP framework
to characterize the diversity of synovial inflammation in RA, identify disease-relevant cell states
and transcriptomic signatures for different phenotypes of RA, and predict disease response.
Overall, this work features a collection of computational methods that investigators can
use to design their studies and analyze their single-cell data. These approaches are broadly
applicable to many single-cell technologies and different diseases and will help investigators
gain a greater understanding of how cells contribute to the pathology of a disease.
Description
Other Available Sources
Keywords
Batch, Cell, Correction, Method, Single, Transcriptomics, Bioinformatics, Immunology
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service