Publication:

Methods for Single Cell and Longevity Genomics

Loading...
Thumbnail Image

Date

2019-05-16

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Townes, F. William. 2019. Methods for Single Cell and Longevity Genomics. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Abstract

Chapter 1: Single cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero-inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform current practice in a downstream clustering assessment using ground-truth datasets. Chapter 2: For scRNA-Seq data lacking UMIs, we propose quasi UMIs: quantile normalization of read counts to a compound Poisson distribution empirically derived from UMI datasets. In an assessment using datasets for which both UMIs and read counts were available, quasi UMIs counts were closer to UMI counts than competing normalization methods such as census counts. Chapter 3: Aging is a complex process with poorly understood genetic mechanisms. Recent studies have sought to classify genes as pro-longevity or anti-longevity using a variety of machine learning algorithms. However, assessments based on held-out test data are lacking. Further, it is not clear which types of features are best for improving classification accuracy and precision. Leveraging gene annotations for two model organisms from the GenAge database, we use gene ontology and publicly available gene expression datasets as features to systematically compare five popular classification algorithms. Elastic net regularized logistic regression (GLM-Net) performs well. Using GLM-Net, we make predictions for pro- and anti-longevity genes among those not found in GenAge.

Description

Other Available Sources

Research Data

Keywords

single cell RNA-Seq, longevity, genomics, dimension reduction, gene expression, normalization

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories