Publication: Fine-tuning Protein Language Models to Identify Interaction Sites Enables Binder Design from Sequence
No Thumbnail Available
Open/View Files
Date
2023-06-30
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Brixi, Garyk. 2023. Fine-tuning Protein Language Models to Identify Interaction Sites Enables Binder Design from Sequence. Bachelor's thesis, Harvard University Engineering and Applied Sciences.
Research Data
Abstract
Designing protein binders to flexible and disordered proteins is important for biological research and future therapeutic applications. Despite the significant progress in de-novo and scaffold based design, current methods do not enable computational design of binders to disordered proteins. By integrating experimentally verified protein-protein interactions with a new model for protein-protein interaction site prediction, we enable the prioritization of target binding peptides from a partner protein without the need for protein structure. We reframe the problem of deriving peptide guides from natural interactions as a protein-protein hotspot prediction task and fine-tune protein language models to predict which portions of a protein sequence are likely involved in binding, enabling the prioritization of continuous regions with high interacting scores for use as binders. Our model, dubbed SaLT&PepPr (Structure-agnostic Language Transformer & Peptide Prioritization) is benchmarked against structure-based models and performs competitively to baselines of structure homology and handcrafted structural features, but as expected has lower performance than state-of-the-art structure-based deep learning models on structured proteins. To validate real world performance in designing binders, our model is experimentally compared against AlphaFold2-multimer in prioritizing binders from partner proteins to different protein targets including disordered transcription factors. Collaborators fused SaLT&PepPr-derived peptides to E3 ubiquitin ligase domains and found robust intracellular degradation of diverse pathogenic targets in human cells, including those with minimal structural information. We further find that our peptide-guided degraders have negligible off-target effects via whole-cell proteomics and demonstrate degradation of endogenous β-catenin and subsequent downregulation of Wnt signaling in cellular models of colorectal cancer. This work shows that sequence based models can enable the prediction of important protein properties including for disordered proteins and suggests that partner-based binders can be created for a wide range of protein targets without the need for co-crystals or co-folds.
Description
Other Available Sources
Keywords
Applied mathematics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service