Publication:
Methods for the design and analysis of disease-oriented multi-sample single-cell studies

No Thumbnail Available

Date

2024-05-07

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Millard, Nghia. 2024. Methods for the design and analysis of disease-oriented multi-sample single-cell studies. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

Recent advances in single-cell technologies have enabled the characterization of heterogeneous cell types in human diseases by measuring various features of individual cells, such as their transcriptomic, proteomic, and epigenomic profiles in the context of their spatial location in tissue. Due to the expensive cost and the high-dimensionality, sparsity, and noisiness of single-cell data investigators who wish to use single-cell technologies face key challenges in designing single-cell studies, performing integrative analysis of cells from multiple samples, and gleaning biological understanding from these data. In this dissertation, I present the development and application of novel computational methods and analysis frameworks that help address these challenges. First, I introduce scPOST, an algorithm for simulating large-scale, multi-sample singlecell RNA-sequencing datasets. scPOST enables investigators to simulate their future single-cell studies with different parameters, such as the number of cells, number of cells per sample, and number of batches. This allows investigators to determine the optimal design parameters for their study. Next, I introduce the development and application of two algorithms, Harmony and Crescendo, which are batch correction algorithms designed to help remove the batch effects that are prominent in single-cell data. I show that these algorithms feature superior performance in removing batch effects and are fast and scalable to large single-cell datasets that contain hundreds of thousands or even millions of cells. Finally, I showcase the application of these methods to analyzing a large 82-sample cohort of rheumatoid arthritis (RA) patients containing 314,000 cells. After performing batch correction with Harmony and a prospective power analysis with scPOST, I introduce a novel framework called cell-type abundance phenotypes (CTAPs) for classifying samples based on the abundance of cell types present in the sample. I then discuss how we used the CTAP framework to characterize the diversity of synovial inflammation in RA, identify disease-relevant cell states and transcriptomic signatures for different phenotypes of RA, and predict disease response. Overall, this work features a collection of computational methods that investigators can use to design their studies and analyze their single-cell data. These approaches are broadly applicable to many single-cell technologies and different diseases and will help investigators gain a greater understanding of how cells contribute to the pathology of a disease.

Description

Other Available Sources

Keywords

Batch, Cell, Correction, Method, Single, Transcriptomics, Bioinformatics, Immunology

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories