Publication: Towards Safe Genome Editing and Rapid Disease Detection: Deep Bayesian Active Learning for Model-Driven CRISPR Guide Design
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Scientists use genome editing tools – nucleases which cut genetic material at targeted locations – to associate genes to diseases, detect viruses, and engineer agriculture, with the future goal of correcting human genetic disorders. CRISPR-Cas systems enable cheap genome editing using a synthesized ribonucleic acid (RNA) guide to direct a Cas nuclease to the desired cutsite. “Guide design” procedures improve the accuracy and safety of CRISPR-Cas interventions by selecting promising candidate RNA guides using a computational model to predict outcomes based on a training dataset. The small size of CRISPR-Cas datasets discourages guide design researchers from using neural networks, a powerful and data-hungry class of models. This thesis presents the first application of Bayesian neural networks (BNNs) – a variant which better handles data scarcity – in genome editing. BNNs are applied on two of the world’s largest CRISPR-Cas datasets, achieving the same accuracy as state-of-the-art approaches with up to 141 times less data and up to 37% higher relative accuracy with equal data. BNNs can readily improve CRISPR guide design, including in Cas13 protocols for cheap and rapid SARS-CoV-2 detection. This work demonstrates the first instance of computer-driven CRISPR experiment design, in which BNNs outperform human expertise in building an effective training dataset.