Publication:

Design of custom CRISPR-Cas9 PAM variant enzymes via scalable engineering and machine learning

Loading...
Thumbnail Image

Date

2025-05-09

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Silverstein, Rachel Anne. 2025. Design of Custom CRISPR-Cas9 PAM Variant Enzymes via Scalable Engineering and Machine Learning. Doctoral Dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

CRISPR-Cas nucleases have facilitated the widespread adoption of precise genome editing in the laboratory. However, CRISPR-Cas mediated genome editing is constrained by the strict requirement for a protospacer adjacent motif (PAM) flanking the genomic target site. This limits the use of SpCas9 to genomic positions which are flanked by an NGG motif. A suite of SpCas9 variants each targeting a distinct PAM would expand the range of accessible genomic sequences while maintaining the specificity and allele-discrimination properties imparted by a PAM requirement. This dissertation describes the development of a range of SpCas9 variants with altered PAM specificities. PAM-altered SpCas9 variants are identified by structure/function-informed saturation mutagenesis followed by bacterial selections. Next, full PAM requirements are characterized for a set of ~1000 enzyme variants and used to train a machine learning model to relate PAM specificity to amino acid sequence. This PAM ML algorithm (PAMmla) is used to predict the PAM requirements for 64 million enzyme variants, leading to novel SpCas9 variants with unique nucleotide preferences at the third and fourth position of the PAM. PAMmla-predicted enzymes outperform evolution-based enzymes and highly optimized SpCas9 variants (e.g. SpG and SpRY) as nucleases and base editors across various sites in human cells and show consistently fewer genome-wide off targets. A second rational engineering strategy, termed “SpRYbridization”, is used to further engineer PAMmla-derived enzymes to relax nucleotide preference at the second position of the PAM, expanding targeting range beyond the NG PAM space while maintaining more specific PAM preferences than SpRY. Together, this work establishes the feasibility of integrating ML with protein engineering to derive a catalog of bespoke SpCas9-based enzymes, achieving a greater plasticity of the PAM interacting domain than previously explored. This framework for quickly identifying safe and effective SpCas9 variant enzymes motivates a shift away from generalist genome editing technologies towards custom editors for a wide range of genome editing applications.

Description

Other Available Sources

Research Data

Keywords

Cas9, CRISPR, genome editing, machine learning, PAM, PAMmla, Biology, Bioinformatics, Bioengineering

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories