Publication:
Methods and Applications of Differential Expression Analysis in Single-cell RNA-sequencing Data

No Thumbnail Available

Date

2022-11-23

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Huang, Linglin. 2022. Methods and Applications of Differential Expression Analysis in Single-cell RNA-sequencing Data. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

Differential expression (DE) is one of the most commonly performed analyses when characterizing a single-cell RNA-seq (scRNA-seq) data set. This dissertation presents methods (Chapter 1 and 2) and applications (Chapter 3) of DE analysis in scRNA-seq data. Chapter 1: Differential expression analysis, in spite of its wide application, is not trivial because the scRNA-seq data are high-dimensional, sparse, and noisy. In this chapter, we focus on the most simple but interesting setting: identifying genes with different mean expression levels between two groups of cells, assuming no complex dependencies among cells within each group. We proposed conditional differential expression, a framework for DE analysis, where we infer gene status of expressed (signal) or not (background), then apply DE algorithms and report results conditioning on the gene status. We discussed the interpretation of conditional DE results and showed that performance improvement could be achieved with a good gene status inference. Chapter 2: The increasing accessibility of scRNAseq has encouraged the emergence of scRNAseq data from complex study designs, with batches over multiple biological replicates or diverse individuals. These data require an expanded definition of differential expression to capture the differences in the variabilities in addition to the means. In this chapter, we proposed a Poisson-lognormal multi-level model to account for both cell-to-cell and individual-to-individual variability. We provided two approaches to estimate the parameters: the method of moment estimators and the maximum likelihood estimator. Benchmarking against pseudobulk method confirmed that our model is not only useful in identifying changes in the mean expression levels across groups but also capable of capturing differences in the variance patterns, which could not be done by pseudobulk. Chapter 3: In this chapter, we applied differential expression algorithms to a real data set of mouse Th17 cells, which are a subset of CD4 T cells that play an important role in autoimmunity. We analyzed Th17 cells collected from different tissues at homeostasis and/or during autoimmunity, characterizing their within and across tissues heterogeneity using unsupervised clustering followed by differential expression. In addition, we identified two subpopulations in the spleen during autoimmunity, inferred their migratory phenotype and plasticity using combined gene expression and T cell receptor information, and validated their functions with transfer and knockout experiments.

Description

Other Available Sources

Keywords

Biostatistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories