Publication:

Contributions to Missing Data Methods in Single-Cell Genomics and Survival Analysis

Loading...
Thumbnail Image

Date

2019-05-17

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Tracy, Samuel. 2019. Contributions to Missing Data Methods in Single-Cell Genomics and Survival Analysis. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Abstract

Missing data occurs when individual data values are not recorded for an observation of interest within a sample. Such events may significantly bias subsequent analyses if ignored. This dissertation discusses solutions to inference and prediction in the presence of data missing at random relevant to single-cell genomics and survival analysis. Chapter 1 presents a computational method to mitigate technical bias due to capture efficiency in single-cell RNA-sequencing data. Framing incorrectly observed zero gene expression values as a missing at random problem, we impute the zero values using information from samples with similar expression patterns. By comparative analysis of simulated and real single-cell RNA-seq datasets, we outperform existing methods in terms of imputation accuracy and increase the precision of cell-type identification. In Chapter 2, we develop a three-step algorithm to infer upon unmeasured spatial patterns of gene expression through the integration of single-cell RNA-seq and sequential fluorescence in situ hybridization data. Through analysis of mouse visual cortex data, we show that this is a useful tool for predicting the spatial pattern of cell-type and domain-specific genes. Chapter 3 presents minimal-area confidence bands for time-to-event functions using a related optimization problem with local time processes. Some event times are unobserved, or censored, resulting in partial missingness. We assume that the censoring mechanism is independent of event time given the observed information, which is analogous to missing at random. The finite-sample performance of the proposed method is assessed by simulation studies and then applied to clinical trial data to evaluate survival times for primary biliary cirrhosis patients treated with D-penicillamine.

Description

Other Available Sources

Research Data

Keywords

missingness, single-cell, survival, imputation, prediction, optimization

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories