Publication: Developing Differentiable Toolkits for Computational Biology
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
In any biological experiment, no matter how sophisticated, we capture only a small, noisy glimpse of a complex underlying process. For dry-lab researchers, interpreting these messy data raises a fundamental question: should one strive for a mechanistic understanding of the biological processes involved, or simply focus on data analysis for a specific task? This dissertation presents three computational tools that emerged from our attempt to strike a balance between these two extremes when modeling novel data.
Chapter 1 presents two stochastic models that address the major contamination issues in probebased bacterial single-cell sequencing: spurious unique molecular identifier (UMI) counts and the difficulty of distinguishing genuine cellular signals from noise. By modeling two specific steps of the 10x sequencing pipeline, these methods accurately infer true UMI counts and identify real cells, enabling downstream single-cell analyses that revealed heterogeneous toxin expression in isogenic C. perfringens populations.
Chapter 2 employs a deep learning technique to predict cellular responses in Perturb-seq experiments. We posit that the intermediate biological adaptations governing these responses are driven by gene regulatory networks composed of directed, nonreciprocal interactions. To model such interactions, we propose a novel directed graph neural network (CoED) along with a new Laplacian (Fuzzy graph Laplacian) that better captures directional effects. We show that learning both the edge directions and the CoED parameters simultaneously improves predictive performance over existing methods.
Chapter 3 presents a differentiable in silico morphogenesis framework that learns to transform a spherical arrangement of point clouds into any desired 3D shape. To compare 3D objects in a manner invariant to index permutations, density, and orientation, we design a loss function that operates in the spectral domain. We also propose a neural network–based force model in which individual agents learn to interact so that, collectively, the system forms the target shape.