Publication: Methods and Applications of Differential Expression Analysis in Single-cell RNA-sequencing Data
No Thumbnail Available
Open/View Files
Date
2022-11-23
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Huang, Linglin. 2022. Methods and Applications of Differential Expression Analysis in Single-cell RNA-sequencing Data. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
Research Data
Abstract
Differential expression (DE) is one of the most commonly performed analyses when characterizing a single-cell RNA-seq (scRNA-seq) data set. This dissertation presents methods (Chapter 1 and 2) and applications (Chapter 3) of DE analysis in scRNA-seq data.
Chapter 1: Differential expression analysis, in spite of its wide application, is not trivial because the scRNA-seq data are high-dimensional, sparse, and noisy. In this chapter, we focus on the most simple but interesting setting: identifying genes with different mean expression levels between two groups of cells, assuming no complex dependencies among cells within each group. We proposed conditional differential expression, a framework for DE analysis, where we infer gene status of expressed (signal) or not (background), then apply DE algorithms and report results conditioning on the gene status. We discussed the interpretation of conditional DE results and showed that performance improvement could be achieved with a good gene status inference.
Chapter 2: The increasing accessibility of scRNAseq has encouraged the emergence of scRNAseq data from complex study designs, with batches over multiple biological replicates or diverse individuals. These data require an expanded definition of differential expression to capture the differences in the variabilities in addition to the means. In this chapter, we proposed a Poisson-lognormal multi-level model to account for both cell-to-cell and individual-to-individual variability. We provided two approaches to estimate the parameters: the method of moment estimators and the maximum likelihood estimator. Benchmarking against pseudobulk method confirmed that our model is not only useful in identifying changes in the mean expression levels across groups but also capable of capturing differences in the variance patterns, which could not be done by pseudobulk.
Chapter 3: In this chapter, we applied differential expression algorithms to a real data set of mouse Th17 cells, which are a subset of CD4 T cells that play an important role in autoimmunity. We analyzed Th17 cells collected from different tissues at homeostasis and/or during autoimmunity, characterizing their within and across tissues heterogeneity using unsupervised clustering followed by differential expression. In addition, we identified two subpopulations in the spleen during autoimmunity, inferred their migratory phenotype and plasticity using combined gene expression and T cell receptor information, and validated their functions with transfer and knockout experiments.
Description
Other Available Sources
Keywords
Biostatistics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service