Publication: Reconstruction of evolving gene variants and fitness from short sequencing reads
No Thumbnail Available
Open/View Files
Date
2021-10-11
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Science and Business Media LLC
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Shen, Max, Kevin Zhao, David Liu. "Reconstruction of evolving gene variants and fitness from short sequencing reads." Nature Chemical Biology 17, no. 11 (2021): 1188-1198. DOI: 10.1038/s41589-021-00876-6
Research Data
Abstract
Directed evolution can generate proteins with tailor-made activities. However, full-length genotypes, their frequencies and fitnesses are difficult to measure for evolving gene-length biomolecules using most high-throughput DNA sequencing methods, as short read lengths can lose mutation linkages in haplotypes. Here we present Evoracle, a machine learning method that accurately reconstructs full-length genotypes (R2 = 0.94) and fitness using short-read data from directed evolution experiments, with substantial improvements over related methods. We validate Evoracle on phage-assisted continuous evolution (PACE) and phage-assisted non-continuous evolution (PANCE) of adenine base editors and OrthoRep evolution of drug-resistant enzymes. Evoracle retains strong performance (R2 = 0.86) on data with complete linkage loss between neighboring nucleotides and large measurement noise, such as pooled Sanger sequencing data (~US$10 per timepoint), and broadens the accessibility of training machine learning models on gene variant fitnesses. Evoracle can also identify high-fitness variants, including low-frequency ‘rising stars’, well before they are identifiable from consensus mutations
Description
Other Available Sources
Keywords
Cell Biology, Molecular Biology
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service