Publication: Biologically motivated artificial intelligence for explainable gene regulatory dynamics
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Models that are formulated as ordinary differential equations (ODEs) can accurately explain temporal gene expression patterns and promise to yield new insights into important cellular processes, disease progression, and intervention design. Learning such ODEs is challenging, since we want to predict the evolution of gene expression in a way that accurately encodes the causal gene-regulatory network (GRN) governing the dynamics and the nonlinear functional relationships between genes. Most widely used ODE estimation methods either impose too many parametric restrictions or are not guided by meaningful biological insights, both of which impedes scalability and/or explainability. To overcome these limitations, we developed PHOENIX, a modeling framework based on neural ordinary differential equations (NeuralODEs) and Hill-Langmuir kinetics, that can flexibly incorporate prior domain knowledge and biological constraints to promote sparse, biologically interpretable representations of ODEs. We test accuracy of PHOENIX in a series of in silico experiments benchmarking it against several currently used tools for ODE estimation. We also demonstrate PHOENIX's flexibility by studying oscillating expression data from synchronized yeast cells and assess its scalability by modelling genome-scale breast cancer expression for samples ordered in pseudotime. Finally, we show how the combination of user-defined prior knowledge and functional forms from systems biology allows PHOENIX to encode key properties of the underlying GRN, and subsequently predict expression patterns in a biologically explainable way. Having developed and validated PHOENIX, we next attempt to obtain very sparse representations of the PHOENIX model in order to aid interpretability. To this end, we explore the field of neural network sparsification and the Lottery Ticket Hypothesis (LTH). We argue how the goal of sparsity needs to be conceptualized conjunctively with the goal of biological meaning, and how traditional approaches of sparsification, such as iterative magnitude pruning, fail to bridge these two objectives. We conjecture that biologically meaningful representations can be obtained by leveraging domain knowledge in the sparsification process. This motivates the formulation of DASH, a domain-aware neural network pruning strategy. We use DASH to engineer an algorithm for pruning PHOENIX and demonstrate how this leads to biologically anchored sparsification in silico. We benchmark DASH against other sparsification strategies on both simulated and real world data. Finally, we apply PHOENIX and DASH to three different case studies in order to demonstrate how our tools can be used to understand gene regulation in the context of lung adenocarcinoma, hematopoietic stem cell differentiation, and Rituximab-treated in B cells.