Publication:

Developing Differentiable Toolkits for Computational Biology

Loading...
Thumbnail Image

Date

2025-06-05

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Pahng, Seong Ho. 2025. Developing Differentiable Toolkits for Computational Biology. Doctoral Dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

In any biological experiment, no matter how sophisticated, we capture only a small, noisy glimpse of a complex underlying process. For dry-lab researchers, interpreting these messy data raises a fundamental question: should one strive for a mechanistic understanding of the biological processes involved, or simply focus on data analysis for a specific task? This dissertation presents three computational tools that emerged from our attempt to strike a balance between these two extremes when modeling novel data.

Chapter 1 presents two stochastic models that address the major contamination issues in probebased bacterial single-cell sequencing: spurious unique molecular identifier (UMI) counts and the difficulty of distinguishing genuine cellular signals from noise. By modeling two specific steps of the 10x sequencing pipeline, these methods accurately infer true UMI counts and identify real cells, enabling downstream single-cell analyses that revealed heterogeneous toxin expression in isogenic C. perfringens populations.

Chapter 2 employs a deep learning technique to predict cellular responses in Perturb-seq experiments. We posit that the intermediate biological adaptations governing these responses are driven by gene regulatory networks composed of directed, nonreciprocal interactions. To model such interactions, we propose a novel directed graph neural network (CoED) along with a new Laplacian (Fuzzy graph Laplacian) that better captures directional effects. We show that learning both the edge directions and the CoED parameters simultaneously improves predictive performance over existing methods.

Chapter 3 presents a differentiable in silico morphogenesis framework that learns to transform a spherical arrangement of point clouds into any desired 3D shape. To compare 3D objects in a manner invariant to index permutations, density, and orientation, we design a loss function that operates in the spectral domain. We also propose a neural network–based force model in which individual agents learn to interact so that, collectively, the system forms the target shape.

Description

Other Available Sources

Research Data

Keywords

Biology, Artificial intelligence, Applied mathematics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories