Publication: Differentiable Programming for Problems in Statistical Mechanics and Biophysics
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
This thesis explores how differentiable programming -- a paradigm that leverages automatic differentiation (AD) for scientific computing -- can be used to advance modeling and design in soft matter and biophysics. Traditional applied mathematics relies on hand-crafted models, approximations, and domain-specific numerics. However, recent advances in hardware acceleration and AD frameworks originally developed for deep learning have transformed the landscape of scientific computing, enabling exact and efficient computation of gradients in complex models.
I demonstrate how this computational shift enables novel capabilities across several domains. I first show how AD enables otherwise intractable analytical calculations. Specifically, I use AD to efficiently evaluate partition functions and assembly yields in systems of anisotropically interacting particles. I then apply this framework to compare calculations under existing models of protein-protein interactions with experimentally-determined assembly yields of de novo proteins. Inspired by this example of poor model generalization, I then focus on AD as an optimization tool for physics-based models. I devise a framework for directly differentiating the aforementioned assembly yield calculation, allowing me to fit protein force fields to target assembly yields via gradient-based optimization. I then extend these techniques to thermodynamic models of nucleic acids, showing that parameters in the popular "nearest neighbor" model describing secondary structure thermodynamics can be fit to data via gradient descent, enhancing predictive power. I also apply differentiable molecular dynamics to design functional colloidal systems.
For some physics-based calculations, direct differentiation for modeling or design is infeasible. One such cause is that a calculation is differentiable in principle but computationally prohibitive to unroll. In the face of this, I leverage and extend novel methods for stochastic gradient estimation to develop a framework for fitting coarse-grained force fields to experimental data. In the second limiting case, a calculation may be inherently discontinuous due to discrete control variables. One example of this is designing RNA sequences with respect to the aforementioned nearest neighbor model, for which I introduce an algorithm to compute the expected partition function over a probability distribution of RNA sequences, enabling gradient-based RNA design. Building on these advanced methods for stochastic gradient estimation and this probabilistic sequence representation, I develop a general method for inverse design in molecular simulations by introducing a notion of expected Hamiltonians. I demonstrate how this enables the rational design of intrinsically disordered proteins, DNA sequences, and even improved particle linking algorithms.
I conclude with a forward-looking perspective on promising applications of these methods, opportunities for future methods development, and proposed directions for novel interfaces between computation, mathematics, and physics.