Publication: Contributions to Missing Data Methods in Single-Cell Genomics and Survival Analysis
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Missing data occurs when individual data values are not recorded for an observation of interest within a sample. Such events may significantly bias subsequent analyses if ignored. This dissertation discusses solutions to inference and prediction in the presence of data missing at random relevant to single-cell genomics and survival analysis. Chapter 1 presents a computational method to mitigate technical bias due to capture efficiency in single-cell RNA-sequencing data. Framing incorrectly observed zero gene expression values as a missing at random problem, we impute the zero values using information from samples with similar expression patterns. By comparative analysis of simulated and real single-cell RNA-seq datasets, we outperform existing methods in terms of imputation accuracy and increase the precision of cell-type identification. In Chapter 2, we develop a three-step algorithm to infer upon unmeasured spatial patterns of gene expression through the integration of single-cell RNA-seq and sequential fluorescence in situ hybridization data. Through analysis of mouse visual cortex data, we show that this is a useful tool for predicting the spatial pattern of cell-type and domain-specific genes. Chapter 3 presents minimal-area confidence bands for time-to-event functions using a related optimization problem with local time processes. Some event times are unobserved, or censored, resulting in partial missingness. We assume that the censoring mechanism is independent of event time given the observed information, which is analogous to missing at random. The finite-sample performance of the proposed method is assessed by simulation studies and then applied to clinical trial data to evaluate survival times for primary biliary cirrhosis patients treated with D-penicillamine.