Publication:

EvoAI enables extreme compression and reconstruction of the protein sequence space

Loading...
Thumbnail Image

Date

2024-02-23

Journal Title

Journal ISSN

Volume Title

Publisher

Springer Science and Business Media LLC
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Zhang, Shuyi, Ziyuan Ma, Wenjie Li, Yunhao Shen, Yunxin Xu, Gengjiang Liu, Jiamin Chang et al. "EvoAI enables extreme compression and reconstruction of the protein sequence space." Nature Methods No Volume. DOI: 10.21203/rs.3.rs-3930833/v1

Abstract

Designing proteins with improved functions requires a deep understanding of how sequence and function are related, a vast space that is hard to explore. The ability to efficiently compress this space by identifying functionally important features is extremely valuable. Here, we first establish a method called EvoScan to comprehensively segment and scan the high-fitness sequence space to obtain anchor points that capture its essential features, especially in high dimensions. Our approach is compatible with any biomolecular function that can be coupled to a transcriptional output. We then develop deep learning and large language models to accurately reconstruct the space from these anchors, allowing computational prediction of novel, highly fit sequences without prior homology-derived or structural information. We apply this hybrid experimental-computational method, which we call EvoAI, to a repressor protein and find that only 82 anchors are sufficient to compress the high-fitness sequence space with a compression ratio of 1048. The extreme compressibility of the space informs both applied biomolecular design and understanding of natural evolution.

Description

Other Available Sources

Research Data

Keywords

Terms of Use

Metadata Only

Endorsement

Review

Supplemented By

Related Stories